Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Afgan bosc2010 galaxy_cloud


Published on

Published in: Technology, Business
  • Be the first to comment

Afgan bosc2010 galaxy_cloud

  1. 1. C A M P A I G N Chief Development Officer Emory University C A M P A I G N School of Nursing Development Emory University 1520 Clifton Road, NE School of Nursing Development Atlanta, Georgia 30322-4207 1520 Clifton Road, NE P 404.727.1234 Atlanta, Georgia 30322-4207 E Deploying heteroplasmic on Galaxy sites Discovery of human heteroplasmic sites enabled by an accessible interface to Discovery of human P 404.727.1234 E cloud the Cloud computing infrastructure enabled by an accessible interface to cloud computing infrastructure Enis Afgan, Dannon Baker, Nate Coraor, Anton Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte Nekrutenko, James Taylor Kateryna Makova, Anton Nekrutenko, James Taylor Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte Kateryna Makova, Anton Nekrutenko, James Taylor Bioinformatics Open Source Conference, July 9, 2010, Boston, MA missions: you are free to blog or live-blog about this presentation as long as you attribute the work to its au
  2. 2. Galaxy: accessible analysis system • Easily integrate new tools • Consistent tool user interfaces automatically generated • History system facilitates and tracks multistep analyses • Exact parameters of a step can always be inspected, and easily rerun • Work ow system Enable accessible, transparent, and reproducible research
  3. 3. Galaxy + Jobs Galaxy + Jobs Job Job Job Job Workstation Galaxy Galaxy Galaxy Cluster Galaxy
  4. 4. Galaxy on the Cloud • Ideal for small labs and individual researchers • Labs do not have to house compute resources • Support variable volume of analysis data and computation requirements • Ready deployment with pre-con gured reference genomes and tools • Goal is to keep Galaxy use unchanged but deliver exibility and job performance improvement
  5. 5. Current Status • Deployment of Galaxy on Amazon Web Services Cloud • Requires no computational expertise, no infrastructure, no software • Support for dynamic resource scaling • Support for dynamic storage • Automated con guration of the Galaxy Cloud machine image • Deploy a Galaxy cluster in minutes!
  6. 6. Deploying Galaxy on the AWS Cloud 1. Create an AWS account and sign up for EC2 and S3 services 2. Use the AWS Management Console to start a master EC2 instance 3. Use the Galaxy Cloud web interface on the master instance to manage the cluster size
  7. 7. 2. Start an EC2 Instance
  8. 8. 3. Con gure Your Cluster
  9. 9. (Starting Workers)
  10. 10. 4. Grow and Shrink
  11. 11. Grow Storage 1. Stop services 2. Detach volume 3. Snapshot 4. New volume 5. Grow le system 6. Resume services
  12. 12. Clean Up • Once the need for a given cluster subsides, - you can always start it back up • Data is preserved while a cluster is down • Complete the shut down process by terminating the master instance from the AWS console
  13. 13. What is Coming • Automatic cluster scaling - Based on workload customization • Automatic job splitting/parallelization
  14. 14. Questions & Comments Try your own cluster; it takes only 5 minutes and less than $1. Complete instructions available at
  15. 15. A Little More GC Details Persistent storage 5° Management 2° 1° Console Galaxy Image 3° 6°, 8° Galaxy Controller 4° Persistent (GC) data repository Setup services 9° 7° Galaxy Image GC-w 10° Galaxy 11° Galaxy Image GC-w Application Galaxy Image GC-w Master instance Galaxy Image GC-w Galaxy Image GC-w
  16. 16. Cloud or No Cloud? Pros Cons • Consumption based • Not a silver bullet cost - cost reduction? • Expensive for 24/7 use • Better utilization of • Offers scalability in resource terms of infrastructure, • Management done applications are still by cloud provider sequential • Faster deployment • The data transfer time problem? • Dynamic scalability • Security?
  17. 17. Enabling Persistence User B User A User A Cluster 1 Cluster 1 Cluster 1 Galaxy Galaxy User On terminate Data Tools Tools Public EBS snapshots Galaxy Galaxy Indices Indices Private EBS Galaxy volume User Tools User Data Data Galaxy Indices User A Cluster 2 Galaxy Tools Galaxy Indices User Data
  18. 18. Enabling Versioning Private S3 bucket GC-User A, GC-default Cluster1 Public GC-User A, S3 buckets Cluster1 - latest GC used GC source - snaps IDs - latest GC-default - prev. versions GC-snaps GC-snaps GC-User A, Cluster2 GC-User A, Cluster2 Public snap IDs - latest GC used - latest - snaps IDs - prev. versions