Enabling Cloud Bursting for
Life Sciences within Galaxy
Enis Afgan
Johns Hopkins University
Galaxy Team
Slides available at bit.ly/gxy-bursting
What is
•  A data analysis and integration tool
•  A (free for everyone) web service integrating a wealth of tools,
compute resources, terabytes of reference data and permanent
storage
•  Open source software that makes integrating your own tools
and data and customizing for your own site simple
?
usegalaxy.org
or
any of the other
60+ public servers
$ hg clone bitbucket.org/
galaxy/galaxy-dist
$ sh run.sh
Galaxy
/Tools
/Data
/Indices
DB
Compute
resources
Galaxy
Galaxy
Galaxy
RNA-Seq
Assembly
Quality
Control (QC)
Local Federated
Galaxy
Object
Store
interface
DB
Indices A
Data A
Tools A
S3, Swift
Pulsar
Indices B
Data B
Tools B
Local
Pulsar
Indices C
Data C
Tools C
Artifact & job provenance
RNA-Seq, Assembly, QC
GalaxyGalaxy
CloudMan
Focus on Cloud Bursting
Peak usage scenarios
Resource heterogeneity
Software licensing
Software installation restrictions
National cyber infrastructure resource access
Per-user, merit-based resource access
Burst Triggers
When?
Resource capacity
Job requirements
Data locality
System configuration
User preferences
Where?
Remote resource availability
Cost
Burst Architecture
1.  Galaxy dynamic job destination framework
2.  Galaxy CloudMan cluster with Pulsar
3.  A job destination mapper function
CloudMan
Pulsar
CloudMan
Pulsar
Local
DRM
Galaxy
<dynamic)job)
destination)
framework)/>
f(mapper)
Pulsar
A standalone job manager server for Galaxy
Can be deployed on dedicated or transient servers (even MS Windows!)
Handles data staging and remote job execution
Pulsarjob
Stage data
Submit job
Monitor job
Send back the data
1. Galaxy dynamic job destination framework
Define job execution properties
•  Runners: local, Slurm, HTCondor, DRMAA, Pulsar, …
•  Destinations: resource & job properties (e.g., DRM queue, wall
time)
2. CloudMan with Pulsar
A.  Launch a Galaxy on the Cloud instance
B.  Enable Pulsar service
C.  Add the instance as a
destination in job config
Tool availability
•  Direct tool install
•  Docker images
3. Job mapper function
Determine job destination at runtime
import pyslurm
 
def cloud_burst():
   n = pyslurm.node()
   nodes_state = n.get()
   available_nodes = []
   for node in nodes_state.itervalues():
       if node['total_cpus'] > 0:
           available_nodes.append(node)
   if not available_nodes:
       return 'pulsar_nectar_galaxy'
   return 'drmaa_runner’
job destination
CloudMan
Pulsar
CloudMan
Pulsar
Local
DRM
Galaxy
<dynamic)job)
destination)
framework)/>
f(mapper)
Pulsar
?
An outcome?
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
100
200
300
400
500
600
700
800
900
1000 2013-41
2013-43
2013-45
2013-47
2013-49
2013-51
2013-53
2014-02
2014-04
2014-06
2014-08
2014-10
2014-12
2014-14
2014-16
2014-18
2014-20
2014-22
2014-24
2014-26
2014-28
2014-30
2014-32
2014-34
2014-36
2014-38
2014-40
2014-42
2014-44
2014-46
2014-48
2014-50
2014-52
2015-01
2015-03
Jobsruntocompletion(count)
Averagewaittime(minutes)
Week
Average wait
Jobs run to completion
usegalaxy.org Start bursting No job wait
More jobs
An outcome?
usegalaxy.org
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
2013-41
2013-43
2013-45
2013-47
2013-49
2013-51
2013-53
2014-02
2014-04
2014-06
2014-08
2014-10
2014-12
2014-14
2014-16
2014-18
2014-20
2014-22
2014-24
2014-26
2014-28
2014-30
2014-32
2014-34
2014-36
2014-38
2014-40
2014-42
2014-44
2014-46
2014-48
2014-50
2014-52
2015-01
2015-03
Jobsdeletedwhilequeued
(%ofjobssubmitted)
Week
User frustration level
Enabling Cloud Bursting for Life Sciences within Galaxy

Enabling Cloud Bursting for Life Sciences within Galaxy