To keep up with the growth of data analysis needs in life sciences, it is becoming necessary to utilize distributed and federated compute and storage resources. The Galaxy application can be used as a locally deployed service, in the Cloud or via any of the public sites. In this talk, we'll look at the ongoing efforts on how to unify compute resources available to Galaxy to enable higher throughput of user jobs.
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Enabling Cloud Bursting for Life Sciences within Galaxy
1. Enabling Cloud Bursting for
Life Sciences within Galaxy
Enis Afgan
Johns Hopkins University
Galaxy Team
Slides available at bit.ly/gxy-bursting
2. What is
• A data analysis and integration tool
• A (free for everyone) web service integrating a wealth of tools,
compute resources, terabytes of reference data and permanent
storage
• Open source software that makes integrating your own tools
and data and customizing for your own site simple
?
3. usegalaxy.org
or
any of the other
60+ public servers
$ hg clone bitbucket.org/
galaxy/galaxy-dist
$ sh run.sh
8. Burst Architecture
1. Galaxy dynamic job destination framework
2. Galaxy CloudMan cluster with Pulsar
3. A job destination mapper function
CloudMan
Pulsar
CloudMan
Pulsar
Local
DRM
Galaxy
<dynamic)job)
destination)
framework)/>
f(mapper)
9. Pulsar
A standalone job manager server for Galaxy
Can be deployed on dedicated or transient servers (even MS Windows!)
Handles data staging and remote job execution
Pulsarjob
Stage data
Submit job
Monitor job
Send back the data
11. 2. CloudMan with Pulsar
A. Launch a Galaxy on the Cloud instance
B. Enable Pulsar service
C. Add the instance as a
destination in job config
Tool availability
• Direct tool install
• Docker images
12. 3. Job mapper function
Determine job destination at runtime
import pyslurm
def cloud_burst():
n = pyslurm.node()
nodes_state = n.get()
available_nodes = []
for node in nodes_state.itervalues():
if node['total_cpus'] > 0:
available_nodes.append(node)
if not available_nodes:
return 'pulsar_nectar_galaxy'
return 'drmaa_runner’
job destination