DevoxxFR 2024 Reproducible Builds with Apache Maven
glideinWMS - The Larger Picture
1. glideinWMS training
glideinWMS
-
The Larger Picture
i.e. Is it something you would be interested in?
by Igor Sfiligoi (UCSD)
glideinWMS training glideinWMS - The Larger Picture 1
2. Why this talk?
If you never heard of glideinWMS before,
you likely have no idea if this is
a product you would be interested in using.
This talk presents
glideinWMS in a larger context,
allowing you to understand
what this product is all about.
glideinWMS training glideinWMS - The Larger Picture 2
3. The basics
● glideinWMS has been designed to address the
needs of High Throughput Computing (HTC)
●
Better known as batch processing
● In a nutshell, we are trying to facilitate
the effective use
of a large number of CPUs
by a large number of users
glideinWMS training glideinWMS - The Larger Picture 3
4. High Throughput Computing
● The basic premise of HTC is that there is
always more demand than available CPUs
● We should make good use of those CPUs
● Keep them busy, ideally, 24x7x365
● Sustained utilization is thus
more important than peak performance
● Measure of success is
FLOPY = Floating Points per Year
not
FLOPS = Floating Points per Second
glideinWMS training glideinWMS - The Larger Picture 4
5. HTC from the user point of view
● As a side effect, users must be HTC-aware
● There are some negative aspects
● No interactive access, only process queuing
– Usually referred to as user jobs
● Waiting in line to get access to CPUs
● But the payoff is potentially huge
● A single user can use 1000s CPUs at a time
● Performing in few days
computations that would
take several years on a single machine
glideinWMS training glideinWMS - The Larger Picture 5
6. HTC in simplified picture
User scheduling
usually not FIFO
Repository
Scheduler
glideinWMS training glideinWMS - The Larger Picture 6
7. HTC products
● There are many HTC products available
● Although most call themselves “batch systems”
● A non exhaustive list:
● Condor
● PBS, with variants like Torque/Maui
● LSF
● SGE, also known as Oracle Grid Engine
glideinWMS training glideinWMS - The Larger Picture 7
8. Why another system?
● All of the mentioned HTC systems
assume full control
of the compute resources (i.e. CPUs)
● And there are many places where this is the case
● glideinWMS developed to
support non-dedicated use
of compute resources
● i.e. when CPUs are given
to the system only
for limited duration at a time
glideinWMS training glideinWMS - The Larger Picture 8
9. Non-dedicated resources
● In the past decade, two paradigms emerged
● Grid computing
● Cloud computing
● Both allow a user community to use compute
resources they don't own
● Often called resource elasticity
● Managing large number of Grid and Cloud
resources by hand impractical
● glideinWMS creates a HTC system using them
glideinWMS training glideinWMS - The Larger Picture 9
10. Grid vs Cloud
(a short summary)
● Grid computing is ● (Commercial) Clouds are
basically a federation about leasing resources
of HTC clusters on a pay-as-you-go basis
● Thus recently called ● And they happen to use
Distributed HTC virtualization
● Job queuing is a ● Instances expected to
native paradigm start almost immediately
● So-called “scientific clouds” are typically
just Grid systems that use virtualization
(and a different middleware stack)
glideinWMS training glideinWMS - The Larger Picture 10
11. Grid vs Cloud
(a short summary)
● Grid computing is ● (Commercial) Clouds are
basically a federation about leasing resources
of HTC clusters on a pay-as-you-go basis
glideinWMS
currently optimized
● Thus recently called ● And they happen to use
for the Grid model
Distributed HTC virtualization
● Job queuing is a ● Instances expected to
native paradigm start almost immediately
● So-called “scientific clouds” are typically
just Grid systems that use virtualization
(and a different middleware stack)
glideinWMS training glideinWMS - The Larger Picture 11
12. glideinWMS and the Grid
(Cloud resources are used in a similar way)
● glideinWMS creates
an overlay system on top of
the various HTC clusters
HTC
● From the user community HTC
point of view, glideinWMS
a single HTC system HTC HTC
HTC
● Just a dynamic one HTC
● glideinWMS
completely automates
the process
glideinWMS training glideinWMS - The Larger Picture 12
13. Implementation and support
● glideinWMS heavily based on Condor
● Essentially a thin layer on top of it
● Most of the software support thus coming
from the Condor development team
● At University of Wisconsin – Madison
http://research.cs.wisc.edu/condor/
● The glideinWMS-specific layer supported by a
team spanning Fermilab, UCSD and ISI
http://tinyurl.com/glideinWMS
glideinWMS training glideinWMS - The Larger Picture 13
14. glideinWMS and Condor
● Condor handles the HTC system
● Most Condor features thus available
● glideinWMS role limited to scheduling,
configuring and starting the Condor process
on the compute resources
HTC
glideinWMS
Condor
CPU Handler
User Job
Condor
Job
Repository
glideinWMS training glideinWMS - The Larger Picture 14
15. Summary
● glideinWMS is a HTC product
● i.e. enables effective use of a large number
of CPUs by a large number of users
● glideinWMS creates a HTC system out of
non‑dedicated compute resources
● e.g. Grid and Cloud resources
● glideinWMS is heavily based on Condor
● thus benefits from the Condor team support
glideinWMS training glideinWMS - The Larger Picture 15
16. Pointers
● glideinWMS development team is reachable at
glideinwms-support@fnal.gov
● The official project Web page is
http://tinyurl.com/glideinWMS
● OSG glidein factory at UCSD
http://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactory
http://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html
glideinWMS training glideinWMS - The Larger Picture 16
17. Acknowledgments
● This document was sponsored by grants from
the US NSF and US DOE,
and by the UC system
glideinWMS training glideinWMS - The Larger Picture 17