Successfully reported this slideshow.
Your SlideShare is downloading. ×

glideinWMS - The Larger Picture

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 17 Ad

More Related Content

Similar to glideinWMS - The Larger Picture (20)

More from Igor Sfiligoi (20)

Advertisement

Recently uploaded (20)

glideinWMS - The Larger Picture

  1. 1. glideinWMS training glideinWMS - The Larger Picture i.e. Is it something you would be interested in? by Igor Sfiligoi (UCSD) glideinWMS training glideinWMS - The Larger Picture 1
  2. 2. Why this talk? If you never heard of glideinWMS before, you likely have no idea if this is a product you would be interested in using. This talk presents glideinWMS in a larger context, allowing you to understand what this product is all about. glideinWMS training glideinWMS - The Larger Picture 2
  3. 3. The basics ● glideinWMS has been designed to address the needs of High Throughput Computing (HTC) ● Better known as batch processing ● In a nutshell, we are trying to facilitate the effective use of a large number of CPUs by a large number of users glideinWMS training glideinWMS - The Larger Picture 3
  4. 4. High Throughput Computing ● The basic premise of HTC is that there is always more demand than available CPUs ● We should make good use of those CPUs ● Keep them busy, ideally, 24x7x365 ● Sustained utilization is thus more important than peak performance ● Measure of success is FLOPY = Floating Points per Year not FLOPS = Floating Points per Second glideinWMS training glideinWMS - The Larger Picture 4
  5. 5. HTC from the user point of view ● As a side effect, users must be HTC-aware ● There are some negative aspects ● No interactive access, only process queuing – Usually referred to as user jobs ● Waiting in line to get access to CPUs ● But the payoff is potentially huge ● A single user can use 1000s CPUs at a time ● Performing in few days computations that would take several years on a single machine glideinWMS training glideinWMS - The Larger Picture 5
  6. 6. HTC in simplified picture User scheduling usually not FIFO Repository Scheduler glideinWMS training glideinWMS - The Larger Picture 6
  7. 7. HTC products ● There are many HTC products available ● Although most call themselves “batch systems” ● A non exhaustive list: ● Condor ● PBS, with variants like Torque/Maui ● LSF ● SGE, also known as Oracle Grid Engine glideinWMS training glideinWMS - The Larger Picture 7
  8. 8. Why another system? ● All of the mentioned HTC systems assume full control of the compute resources (i.e. CPUs) ● And there are many places where this is the case ● glideinWMS developed to support non-dedicated use of compute resources ● i.e. when CPUs are given to the system only for limited duration at a time glideinWMS training glideinWMS - The Larger Picture 8
  9. 9. Non-dedicated resources ● In the past decade, two paradigms emerged ● Grid computing ● Cloud computing ● Both allow a user community to use compute resources they don't own ● Often called resource elasticity ● Managing large number of Grid and Cloud resources by hand impractical ● glideinWMS creates a HTC system using them glideinWMS training glideinWMS - The Larger Picture 9
  10. 10. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis ● Thus recently called ● And they happen to use Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 10
  11. 11. Grid vs Cloud (a short summary) ● Grid computing is ● (Commercial) Clouds are basically a federation about leasing resources of HTC clusters on a pay-as-you-go basis glideinWMS currently optimized ● Thus recently called ● And they happen to use for the Grid model Distributed HTC virtualization ● Job queuing is a ● Instances expected to native paradigm start almost immediately ● So-called “scientific clouds” are typically just Grid systems that use virtualization (and a different middleware stack) glideinWMS training glideinWMS - The Larger Picture 11
  12. 12. glideinWMS and the Grid (Cloud resources are used in a similar way) ● glideinWMS creates an overlay system on top of the various HTC clusters HTC ● From the user community HTC point of view, glideinWMS a single HTC system HTC HTC HTC ● Just a dynamic one HTC ● glideinWMS completely automates the process glideinWMS training glideinWMS - The Larger Picture 12
  13. 13. Implementation and support ● glideinWMS heavily based on Condor ● Essentially a thin layer on top of it ● Most of the software support thus coming from the Condor development team ● At University of Wisconsin – Madison http://research.cs.wisc.edu/condor/ ● The glideinWMS-specific layer supported by a team spanning Fermilab, UCSD and ISI http://tinyurl.com/glideinWMS glideinWMS training glideinWMS - The Larger Picture 13
  14. 14. glideinWMS and Condor ● Condor handles the HTC system ● Most Condor features thus available ● glideinWMS role limited to scheduling, configuring and starting the Condor process on the compute resources HTC glideinWMS Condor CPU Handler User Job Condor Job Repository glideinWMS training glideinWMS - The Larger Picture 14
  15. 15. Summary ● glideinWMS is a HTC product ● i.e. enables effective use of a large number of CPUs by a large number of users ● glideinWMS creates a HTC system out of non‑dedicated compute resources ● e.g. Grid and Cloud resources ● glideinWMS is heavily based on Condor ● thus benefits from the Condor team support glideinWMS training glideinWMS - The Larger Picture 15
  16. 16. Pointers ● glideinWMS development team is reachable at glideinwms-support@fnal.gov ● The official project Web page is http://tinyurl.com/glideinWMS ● OSG glidein factory at UCSD http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html glideinWMS training glideinWMS - The Larger Picture 16
  17. 17. Acknowledgments ● This document was sponsored by grants from the US NSF and US DOE, and by the UC system glideinWMS training glideinWMS - The Larger Picture 17

×