Grid based-vm-provisioning-a-presentation-by-arunabh-das


Published on

Grid Based VM Provisioning - A Presentation by Arunabh Das

Published in: Technology
1 Comment
  • I want to know why i can't download this presentation.............
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Grid based-vm-provisioning-a-presentation-by-arunabh-das

  1. 1. Grid-based VM-provisioning Using STAR / Nimbus<br />Date : 2 Apr, 2009<br />Presented by : Arunabh Das<br />School of Computer Science<br />Sources:<br /><br /><br />Other Sources :<br />Enabling Cost-Effective Resource Leases with Virtual Machines, Sotomayor, B., K. Keahey, I. Foster, T. Freeman. HPDC 2007 Hot Topics session, Monterey Bay, CA. June 2007 (pdf)<br />Virtual Workspaces for Scientific Applications, Keahey, K., T. Freeman, J. Lauret, D. Olson. SciDAC 2007 Conference, Boston, MA. June 2007 (pdf)<br />
  2. 2. Topics covered in this presentation<br />What are the goals. What are the problems we are trying to solve?<br />Benefits of grid computing<br />Issues with grid computing<br />Why grid computing ! = cloud computing<br />The case for grid VM<br />Brief history of grid Vms<br />Introduction to STAR and description of provisioning with STAR – schematics + deployment<br />Introduction to NIMBUS and description of provisioning – mechanism with NIMBUS<br />
  3. 3. What do we seek?What are we looking to find?<br />Ginormous amounts of compute power<br />Available 24x7x365<br />Humongous amounts of storage<br />Also available 24x7x365<br />The ability to access the above from the cupboards that professors and post-doc fellows and the millions of starving graduate students in North America, Europe, Asia and Africa live in<br />
  4. 4. Why?Who needs all that compute power?<br />Many people, but just as an example -<br />The data stream from the LHC detector is approximately 300 GB/s<br />The CERN computer center has a dedicated 10 GB/s connection to the counting room<br />27 TB of raw data + 10 TB of event summary data<br />LHC Computing Grid has hundreds of Tier 1 and Tier2 institutions connected via dedicated 10 GB/s links<br />Source :<br />
  5. 5. So we still haven't found what we're looking for?<br />Kate Keahey is a scientist at Argonne National Laboratory and a Computation Institute fellow at University of Chicago<br />She created and leads the Nimbus Project<br />She calls it Infrastructure as a Service (IaaS)<br />Which makes perfect sense!!<br />
  6. 6. A Brief History of Nimbus<br />First STAR<br />production<br />run on EC2<br />Xen released<br />EC2 goes online<br />Nimbus Cloud<br />comes online<br />2003<br />2009<br />2006<br />Research on<br />agreement-based<br />services<br />First WSRF<br />Workspace Service<br />release<br />Support for<br />EC2 interfaces<br />EC2 gateway<br />available<br />Context Broker<br />release<br />Source -<br /><br />
  7. 7. Grid Technologies : A brief overview<br />Infrastructure (”middleware”) for establishing, managing and evolving multi-organization federations<br />Secure, coordinated sharing<br />Dynamic, autonmous, domain independent<br />On-demand, ubiquitous access to computing, data and services<br />Globus Toolkit : An implementation of the most basic capabilities<br />A de facto implementation standard<br />
  8. 8. A typical grid use-case<br />gridmapfile<br />4. Transfers data from a remote location<br />Grid Security Infrastructure (GSI)‏<br />Monitoring and Discovery Service (MDS)‏<br />Grid Resource and Allocation Manager (GRAM)‏<br />Data Transfer (GridFTP)‏<br />3. Starts a remote computation<br />1. User logs into the Grid (single sign-on):<br /> grid-proxy-init<br />2. Finds available resources<br />
  9. 9. The case for VM on grid<br />Most grid applications and grid infrastructure just needs to be able to handle heavy computation and heavy lifting of data<br />However, certain applications (Ex – Nuclear Physics STAR Experiment) rely heavily on dynamically loading external libraries depending on the task to be performed<br />Configuring an environment for such an application is complex<br />Deployment on non-dedicated platform = effort consuming<br />Even when the application compiles on a new platform, validating it is a controlled process subject to quality assurance and regression testing to ensure Physics reproducibility and result uniformity<br />Heavy reliance of an application (ex-Physics Engine) on dependencies deeply embedded in the environment<br />=> Porting application would be easiest if we could take the full software stack from the operating system up, and simply install that environment on remote resources<br />
  10. 10. The case for VM on grid (contd)<br />Virtual machine provides a software-based virtualization of a physical host machine<br />Dedicated<br />Configured with a full software stack<br />Once configured, deploys on a remote resource in a matter of milliseconds<br />Resource provisioning via Vms is attractive<br />
  11. 11. More benefits of VM on grid<br />A scientist can develop his or her application within a familiar environment<br />Can port this environment between local and remote resources as the need arises<br />This facilitates provisioning resources for an application<br />The virtual machine can be run as easily on local resources as on remote resources or resources outsourced commercially<br />
  12. 12. A quick look at Virtual Workspaces (STAR)<br />STAR is the predecessor of Nimbus and was developed by Kate Keahey and Tim Freeman at ANL<br />The Solenoidal Tracker at RHIC (STAR) is a detector which specializes in tracking the thousands of particles produced by each ion collision at RHIC. (Relativistic Heavy Ion Collider)<br />STAR is a massive detector.<br />It is used to search for signatures of the form of matter that RHIC was designed to create: the quark-gluon plasma.<br />It is also used to investigate the behavior of matter at high energy densities by making measurements over a large area.<br />It is a proof-of-concept strategy developed for the High Energy and Nuclear Physics (HENP) group<br />
  13. 13. A Brief Look at VM<br />A VM is a virtualization abstraction of a physical machine (hardware resources + software infrastructure)<br />Software running on a host supporting VM deployment, typically called a VMM (Virtual Machine Monitor) or Hypervisor is responsible for supporting this abstraction by intercepting and emulating instructions issued by the guest machine<br />Hypervisor provides an interface allowing client to start, pause, serialize, and shut down multiple guests<br />VM image is composed of a full image of a VM RAM, disk images and configuration files<br />Thus, VM can be paused, its state serialized and later resumed at a different time and in a different location<br />=> Decouples image preparation from deployment => Easy migration<br />Sources :<br />Enabling Cost-Effective Resource Leases with Virtual Machines, Sotomayor, B., K. Keahey, I. Foster, T. Freeman. HPDC 2007 Hot Topics session, Monterey Bay, CA. June 2007 (pdf)<br />Virtual Workspaces for Scientific Applications, Keahey, K., T. Freeman, J. Lauret, D. Olson. SciDAC 2007 Conference, Boston, MA. June 2007 (pdf)<br />
  14. 14. Paravirtualization<br />Virtualization technique that presents a software interface to virtual machines that is similar but not identical to that of the underlying hardware.<br />Example of paravirtualization =<br />The virtual monitor can present the host operating system with an intelligent NIC with support for DMA-based sending of packets, even though the NIC on the real system lacks this capability.<br />Sending packets is then done entirely by the virtual monitor and NIC interrupts may be processed by the monitor too<br />Since delivering interrupts to the host operating system is expensive, performance can improve.<br />Who'd a thunk it?<br />Paravirtualization actually helps performance!!<br /><br />
  15. 15. Virtual Workspace Features<br />Workspace provides interfaces based on the WSRF<br />Allows an authorized Grid client to deploy, shutdown, pause and reactivate VMS<br />
  16. 16. Worker node deployment workflow<br />Worker node deployment is requested on-demand by an authorized off-site grid client<br />Resource allocation request asks for 2 GB memory and the full use of a CPU for each virtual node<br />On deployment, each node reports to Condor headnode and joins the Condor pool<br />A web application displays current virtual cluster node information based on Condor pool properties<br />A client can then start jobs on the deployed VM using GRAM2 deployed on the static CE (compute element)<br />
  17. 17. Schematic for provisionig of STAR nodes<br />TeraPort node<br />TeraPort node<br />TeraPort node<br />TeraPort node<br />TeraPort node<br />Provisioning<br />STAR nodes<br />Workspace<br />Service<br />Star node<br />Star node<br />Star node<br />Star node<br />Star node<br />STAR<br />Execute new<br />STAR instance<br />Star node<br />Star node<br />Star node<br />OSG CE<br />GRAM<br />
  18. 18. Current Cloud providers<br />Go Grid<br />Amazon Web Services<br />Google App Engine<br />Mosso<br />Slice Host<br />Media Temple<br />Flexiscale<br />Joyent<br />Although they provide web<br />services and compute-power-<br />on-demand, they are not Virtual<br />Machines on grid<br />
  19. 19. Cloud Computing – Everything as a Service<br />Elastic computing,<br />Pay-as-you-go,<br />Capital expense<br />operational expense<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  20. 20. Cloud Computing – Everything-as-a-service<br />Software as a Service<br />SaaS<br />PaaS<br />Platform as a Service<br />IaaS<br />Infrastructure as a Service<br />The anology to the real world is that it used to be that if you wanted to go to the airport, you could call a cab and pay the cab-driver.<br />Then, they said, you know what – if you pay us, we can let you rent the car and you can have the car, but you can't be setting the car on fire<br />Now – you can lease a car and have the car for as long as you want and do whatever you want to the car. Ofcourse, you are going to be able to do a lot more than just drive to the airport with the car.<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  21. 21. Main problems we're trying to solve<br />Code complexity<br />Resource control<br />Source - The Nimbus Toolkit :<br />
  22. 22. The concept of 'workspaces'<br />Dynamicall provisioned environment<br />Environment control<br />Resource control<br />Hardware implementations vs. virtualization<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  23. 23. Nimbus Overview<br />Goal: open source, extensible, IaaS implementation and tools<br />Specifically targeting scientific community<br />A platform for experimentation with features for scientific needs<br />Set up private clouds (privacy, expense considerations)<br />Tools<br />IaaS layer (Workspace Service)<br />Orchestration layer (Context Broker, gateway)<br /><br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  24. 24. Workspace Pilot and the concept of resource leases<br />Resource leases – Allow users to request direct access to resources rather than ask for a job to be run on those resource<br />Examples<br />A static long-term agreement with a hosting company<br />On-demand provisioning of a physical cluster partition with a specified configuration (Cluster-on-demand)<br />Dynamically deploying a virtual machine for an hour on resource provied by Amazon's EC2 service<br />
  25. 25. Advantages of 'Flying Low' (Workspace Pilot)<br />A user can adapt resource to his needs<br />Use it to support an interactive session<br />Run computations requiring an application-specific scheduler<br />Support portability tests across a variety of environments<br />Exemplified by 'pilot job' approaches that use batch scheduler installations on sites to deliver a lease rather than submit a job to that scheduler<br />Source : Flying Low : Simple Leases with Workspace Pilot<br />Tim Freeman Kate Keahey, University of Chicago, ANL<br />
  26. 26. Implementation of VWS (Virtual Workspace Service)<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />VWS<br />Service<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  27. 27. The Workspace Service<br />The workspace service publishes<br />information on each workspace<br />as standard WSRF Resource<br />Properties.<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />VWS<br />Service<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Pool<br />node<br />Users can interact directly with their workspaces the same way the would with a physical machine.<br />Trusted Computing Base (TCB)<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  28. 28. Workspace Service Interfaces and Clients<br />Web Services based<br />Web Service Resource Framework (WSRF)<br />GT-based<br />Elastic Computing Cloud (EC2)<br />Supported: ec2-describe-images, ec2-run-instances, ec2-describe-instances, ec2-terminate-instances, ec2-reboot-instances, ec2-add-keypair, ec2-delete-keypair<br />Unsupported: availability zones, security groups, elastic IP assignment, REST<br />Used alongside WSRF interfaces<br />E.g., the University of Chicago cloud allows you to connect via the cloud client or via the EC2 client<br />
  29. 29. Nimbus Schematic<br />storage<br />service<br />workspace<br />resource<br />manager<br />workspace<br />control<br />workspace<br />service<br />workspace<br />pilot<br />WSRF<br />EC2<br />IaaS<br />gateway<br />EC2<br />potentially other providers<br /> context broker <br />context<br />client<br />workspace<br />client<br />cloud<br />client<br />Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  30. 30. Science Clouds<br /><ul><li>Make it easy for scientific projects to experiment with cloud computing
  31. 31. Can cloud computing be used for science?
  32. 32. Evolve software in response to the needs of scientific projects
  33. 33. Start with EC2-like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases
  34. 34. Federating clouds: moving between cloud resources in academic and commercial space</li></ul>Source : Cloud Computing with Nimbus, FNAL, January 2009<br />Kate Keahey, University of Chicago, ANL<br />
  35. 35. Science Cloud Resources<br /><ul><li>University of Chicago (Nimbus):
  36. 36. first cloud, online since March 4th 2008
  37. 37. 16 nodes of UC TeraPort cluster, public IPs
  38. 38. University of Florida
  39. 39. Online since 05/08
  40. 40. 16-32 nodes, access via VPN
  41. 41. Other Science Clouds
  42. 42. Masaryk University, Brno, Czech Republic (08/08), Purdue (09/08)‏
  43. 43. Installations in progress: IU, Grid5K, others
  44. 44. Using EC2 for overflow
  45. 45. Minimal governance model
  46. 46.</li></li></ul><li>Nimbus Walkthrough<br />
  47. 47. Open Source IaaS Implementations<br /><ul><li>Eucalyptus
  48. 48. Open source implementation of EC2
  49. 49. UCSB, R. Wolski & team, 06/2008
  50. 50. OpenNebula
  51. 51. Open source datacenter implementation
  52. 52. University of Madrid, I. Llorente & team, 03/2008
  53. 53. Cloud-enabled Nimrod-G
  54. 54. Monash University, MeSsAGE Lab, 01/2009
  55. 55. Industry efforts
  56. 56. openQRM, Enomalism</li></li></ul><li>The Nimbus Community<br /><ul><li>Committers: Kate Keahey & Tim Freeman (ANL/UC), Ian Gable (UVIC)‏
  57. 57. A lot of help from the community, see:
  58. 58. Collaborations:
  59. 59. Cumulus: S3 implementation (Globus team)‏
  60. 60. EBS implementation with IU
  61. 61. Appliance management: rPath and Bcfg2 project
  62. 62. Virtual network overlays: University of Florida
  63. 63. Security: Vienna University of Technology</li></li></ul><li>Future<br /><ul><li>Increasing Importance of Appliance Providers
  64. 64. Cloud computing tools
  65. 65. Increased interest in cloud interoperability
  66. 66. Standards: “rough consensus & working code”
  67. 67. Image formats, contextualization capabilities, cloud interfaces, etc.
  68. 68. Cloud markets</li></li></ul><li>Interestingly enough....<br />The font used in this presentation -<br />Nimbus Roman No9L<br />