1. C A M P A I G N
Chief Development Officer
Emory University
C A M P A I G N
School of Nursing Development
Emory University
1520 Clifton Road, NE
School of Nursing Development
Atlanta, Georgia 30322-4207
1520 Clifton Road, NE
P 404.727.1234
Atlanta, Georgia 30322-4207
E adorrill@emory.edu
Deploying heteroplasmic on
Galaxy sites
Discovery of human heteroplasmic sites
enabled by an accessible interface to
Discovery of human
P 404.727.1234
E adorrill@emory.edu
cloud the Cloud
computing infrastructure
enabled by an accessible interface to
cloud computing infrastructure
Enis Afgan, Dannon Baker, Nate Coraor, Anton
Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte
Nekrutenko, James Taylor
Kateryna Makova, Anton Nekrutenko, James Taylor
Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte
Kateryna Makova, Anton Nekrutenko, James Taylor
Bioinformatics Open Source Conference, July 9, 2010, Boston, MA
missions: you are free to blog or live-blog about this presentation as long as you attribute the work to its au
2. Galaxy: accessible
analysis system
• Easily integrate new tools
• Consistent tool user interfaces
automatically generated
• History system facilitates and
tracks multistep analyses
• Exact parameters of a step
can always be inspected, and
easily rerun
• Work ow system
http://usegalaxy.org/
Enable accessible, transparent, and reproducible research
3. Galaxy
+ Jobs Galaxy
+ Jobs
Job Job
Job
Job
Workstation
Galaxy
Galaxy
Galaxy
Cluster Galaxy
4. Galaxy on the Cloud
• Ideal for small labs and individual researchers
• Labs do not have to house compute resources
• Support variable volume of analysis data and
computation requirements
• Ready deployment with pre-con gured
reference genomes and tools
• Goal is to keep Galaxy use unchanged but deliver
exibility and job performance improvement
5. Current Status
• Deployment of Galaxy on Amazon Web Services Cloud
• Requires no computational expertise, no
infrastructure, no software
• Support for dynamic resource scaling
• Support for dynamic storage
• Automated con guration of the Galaxy Cloud machine
image
• Deploy a Galaxy cluster in minutes!
6. Deploying Galaxy on
the AWS Cloud
1. Create an AWS account and sign up for EC2
and S3 services
2. Use the AWS Management Console to start a
master EC2 instance
3. Use the Galaxy Cloud web interface on the
master instance to manage the cluster size
12. Grow Storage
1. Stop services
2. Detach volume
3. Snapshot
4. New volume
5. Grow le system
6. Resume services
13. Clean Up
• Once the need for a given cluster subsides,
- you can always start it
back up
• Data is preserved while a cluster is down
• Complete the shut down process by
terminating the master instance from the
AWS console
14. What is Coming
• Automatic cluster scaling
- Based on workload customization
• Automatic job splitting/parallelization
15. Questions
&
Comments
Try your own cluster; it takes only 5
minutes and less than $1.
Complete instructions available at
http://usegalaxy.org/cloud
17. Cloud or No Cloud?
Pros Cons
• Consumption based • Not a silver bullet
cost - cost reduction?
• Expensive for 24/7 use
• Better utilization of
• Offers scalability in
resource
terms of infrastructure,
• Management done applications are still
by cloud provider sequential
• Faster deployment • The data transfer
time problem?
• Dynamic scalability • Security?
18. Enabling Persistence
User B User A User A
Cluster 1 Cluster 1 Cluster 1
Galaxy Galaxy User
On terminate Data
Tools Tools
Public EBS
snapshots
Galaxy Galaxy
Indices Indices Private EBS
Galaxy volume
User Tools User
Data Data
Galaxy
Indices
User A
Cluster 2
Galaxy
Tools
Galaxy
Indices
User
Data
19. Enabling Versioning
Private
S3 bucket GC-User A,
GC-default Cluster1
Public GC-User A,
S3 buckets Cluster1 - latest GC used
GC source
- snaps IDs
- latest
GC-default - prev. versions
GC-snaps
GC-snaps GC-User A,
Cluster2
GC-User A,
Cluster2 Public snap IDs - latest GC used
- latest - snaps IDs
- prev. versions