The document discusses bringing OSG (Open Science Grid) users to the PRP (Pacific Research Platform) Kubernetes cluster. It outlines how OSG focuses on distributed high-throughput computing and opportunistic use of resources. It describes how two OSG user communities, IceCube and LIGO, have begun using PRP resources through Kubernetes and HTCondor. Challenges addressed include providing CVMFS and dealing with resource contention between multiple OSG users on the shared PRP cluster. Initial usage statistics for IceCube and LIGO workloads on PRP resources are provided.
1. Bringing OSG users to
the PRP Kubernetes Cluster
Presented by Igor Sfiligoi, UCSD
May 13, 2019 1
2. Outline
• PRP and Kubernetes
• OSG Overview
• OSG and Opportunistic use
• IceCube and LIGO science
• CVMFS and Unprivileged containers
• Lack of nested containerization
• Dealing with multiple users
• Some stats
May 13, 2019 2
3. PRP and Kubernetes
• Pacific Research Platform (PRP) originally created
as a regional networking project
• Establishing end-to-end links between 10Gbps and 100Gbps
• It has recently become also a major resource provider
• About 3.5k CPU cores and 330 GPUs
• About 2PB of storage space
• Kubernetes chosen for resource management
• Industry standard – Large and active development and support community
• Container based – More freedom for users
• Flexible scheduling – Allows for easy mixing of service and user workloads
May 13, 2019 3
4. The Open Science Grid (OSG)
• The Open Science Grid (OSG) is a NSF funded effort
• Open to all open science irrespective of discipline,
but does manage differently:
• The 4 “big science” projects
• Multi-institutional Science Teams,
• Campus Research Support Organizations,
• Individual researchers
• Focused on supporting dHTC workflows
• Due to almost perfect scalability
• Owns no compute resources
• Acts as a glue between and among
resource providers and users
Advancing
Open Science
through
distributed High Throughput Computing
(dHTC)
May 13, 2019 4
5. A few words about dHTC
The challenge in successful dHTC
is two-fold:
• Separate a big computing
problem in many
individually schedulable
small problems.
• Minimize your requirements
in order to maximize the raw
capacity that you can
effectively use.
As a computing paradigm, dHTC is special because
it scales by definition perfectly.
• When a researcher understands how to partition
their workflow into many individually schedulable
compute problems they can scale out with ease to
seemingly arbitrary scales of computing.
• When one integrates all IT resources at Universities,
National Labs, and the commercial cloud, you arrive
at a near infinite resource pool.
Ingenious ParallelismAttribution: From Frank Wuerthwein’s OSG talk May 13, 2019 5
6. dHTC and Opportunistic Use
• By creating a global virtual resource pool, OSG can both
• Help distributed organizations to spread their usage among many sites
• Give unused resources to external users – Opportunistic use
• Opportunistic use must be minimally invasive
• Resources owners should get their resources back
when they need them again (ideally, within seconds)
• dHTC ideally suited for this operational model
• All tasks are short and independent
• An opportunistic task can be killed without much loss;
the system will automatically reschedule it somewhere else
Never let
a cycle
go unused!
May 13, 2019 6
7. OSG use of PRP
• PRP currently has very little resource contention – GPU and CPU cycles were going unused
• OSG has several user communities who could use more resources
In this talk, OSG integration with
PRP is by means of opportunistic
use
• Low priority containers are only scheduled if there is no contention by higher priority ones
• When a higher priority container needs the resource, the lower priority container is
instantly killed
Kubernetes naturally allows for
opportunistic use by means of
priorities
• But not part of this talk
• Other future uses possible, too
OSG is using PRP also for running
some of its internal services
May 13, 2019 7
8. OSG users on PRP
PRP was the first Kubernetes-
based resource provider for OSG
• Lots of unknowns
• So we decided to only target a
few, well behaved user
communities as a starting point
We started with the two smaller
of the four big science projects
• IceCube
• LIGO
May 13, 2019 8
9. IceCube
• The IceCube Neutrino Observatory is designed
to observe the cosmos from deep within the
South Pole ice. Encompassing a cubic kilometer
of ice, IceCube searches for nearly massless
subatomic particles called neutrinos.
• One of the most compute intensive activities is
simulating the properties of ice and its photon
propagation in the presence of neutrinos.
Direct Photon Propagation
May 13, 2019 9
10. LIGO • LIGO’s mission is to open the field of gravitational-
wave astrophysics through the direct detection of
gravitational waves. LIGO detectors use laser
interferometry to measure the distortions in space-
time occurring between stationary, hanging masses
(mirrors) caused by passing gravitational waves.
• The main compute activities of LIGO are event
template timeseries searches and parameter fitting.
May 13, 2019 10
11. Integration with OSG
OSG natively does
not know how to talk
to Kubernetes
• We needed a batch
system interface
Instantiated a HTCondor pool as a
Kubernetes/Containerized deployment
• Pretty straightforward, just needed to create
images with HTCondor binaries in them
• Configuration wise, not much different than a
bare metal setup
• HTCondor deals gracefully with dynamic host
names
The OSG gateway
(known as a CE)
was also
containerized
• Here I needed some
elevated privileges
• Must use host IP and
DNS due to the use of
GSI/X.509
May 13, 2019 11
12. The drawbacks of containerization
OSG normally ask resource providers to
provide two services on all execute nodes
• CVMFS – A FUSE-mounted global filesystem
• Singularity – So users can launch their own containers
Both need elevated privileges
• Could not just put them in
the HTCondor execute Image
and run as a regular
Kubernetes Container
May 13, 2019 12
13. CVMFS and Kubernetes CSI
•Turns out it is a common enough problem (think Box and Netflix)
•From the technical point of view, it is implemented as admin-deployed, privileged
side containers
Kubernetes answer is CSI
(Container Storage Interface)
•CERN had developed against a beta version of the API (now deprecated)
•Dima Mishin did the re-factoring – Contributed back the changes
•Also switched from CERN-internal version to OSG-provided version of the RPMs
Had to fix CERN-
provided version
•Normally, CVMFS relies on autofs
•But autofs does not work in side-containers, so explicit mounting needed
Some minor
problems still remain
May 13, 2019 13
14. CVMFS and Kubernetes CSI
•Turns out it is a common enough problem (think Box and Netflix)
•From the technical point of view, it is implemented as admin-deployed, privileged
side containers
Kubernetes answer is CSI
(Container Storage Interface)
•CERN had developed against a beta version of the API (now deprecated)
•Dima Mishin did the re-factoring – Contributed back the changes
•Also switched from CERN-internal version to OSG-provided version of the RPMs
Had to fix CERN-
provided version
•Normally, CVMFS relies on autofs
•But autofs does not work in side-containers, so explicit mounting needed
Some minor
problems still remain
Working well enough nowMay 13, 2019 14
15. No Singularity
•OSG pilots can only invoke Singularity directly
•But nested containerization is not supported
Currently no real solution for
Singularity
•With only a couple supported users it is doable
•But still time consuming
•And will not scale
Have been creating
user specific
HTCondor execute Images
•HTCondor adding native support for Kubernetes
•OSG pilots could then provide the container Image to launch
•Exploring dynamic side-container options in Kubernetes
Looking at longer term
alternatives
May 13, 2019 15
16. Dockerfile blues
• Usually expect to be root when running
• Optimized for Singularity, not Kubernetes/Docker
• Differences in GPU driver integration particularly nasty
Users have images
they use on OSG, but
• Users not root anymore
• May have conflicts in required libraries
• Environment differences
Need to inject OSG
environment and HTCondor
• It is usually just a set of yum installs and the like
• Adding the PRP-specific additions then gets easy
• But a few back-and-forts usually still needed
Easier with access to
their Dockerfile
May 13, 2019 16
17. Contention in Opportunistic use
We started with
a single OSG user
•Using all opportunistic
resources was easy
•Just keep enough low-priority
pods in the system
Adding a second user
was easy, too
•First was a GPU user, the
second was a CPU user
•No contention between them
First user wants the same
resources as second user
•They use different container Images
• Contention!
May 13, 2019 17
18. Learning to deal with contention
• Was not a problem in the absence of opportunistic contention
• Should be solvable, similar to how pilots operate in OSG
OSG pods currently stay alive
even if no work available
• Not with the default scheduler
• Basically priority-FIFO
Kubernetes not good at
contention management
• A couple of times a week was good enough so far
• But will need a better solution longer term
Currently manually adjusting
pressure
May 13, 2019 18
19. OSG a special case in Nautilus
• OSG “user” is different than other Nautilus users
• Not a regular user – Preemption-tolerant, low priority
• Not a service – Heavy users of GPUs and CPUs
• Nautilus admins had to create special rules for us
• E.g. it is OK to “waste” GPUs
• But the same rules would likely apply to any opportunistic user
May 13, 2019 19
20. Some stats – IceCube GPU
• First use case
• over 2 months now
• Periods of demand
• Followed by only small bursts of requests
May 13, 2019 20
21. Some Stats – CPU usage
• Started with public LIGO
• They had no GPU needs
• But progress was slow
• Recently added IceCube
• Output of CPU jobs needed to run GPU jobs
• Should result in higher GPU demand
May 13, 2019 21
22. Some stats – LIGO GPU
• And LIGO experiment now needs GPUs, too
• Although demand still low
May 13, 2019 22
23. Preemption in action
• Kubernetes will automatically regulate number of slots
• Kill containers when higher-priority users need them
• Re-start the OSG containers when nobody else is requesting them
May 13, 2019 23
25. Acknowledgements
• PRP/TNRP is supported by US National
Science Foundation (NSF) awards CNS-
1456638, CNS-1730158, ACI-1540112,
ACI-1541349, OAC-1826967, OAC
1450871 and OAC-1659169.
• OSG Multi-Messenger Astrophysics
activities are supported by US
National Science Foundation (NSF)
award OAC-1841530.
May 13, 2019 25