Afgan bosc2010 galaxy_cloud

  • 668 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
668
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
17
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. C A M P A I G N Chief Development Officer Emory University C A M P A I G N School of Nursing Development Emory University 1520 Clifton Road, NE School of Nursing Development Atlanta, Georgia 30322-4207 1520 Clifton Road, NE P 404.727.1234 Atlanta, Georgia 30322-4207 E adorrill@emory.edu Deploying heteroplasmic on Galaxy sites Discovery of human heteroplasmic sites enabled by an accessible interface to Discovery of human P 404.727.1234 E adorrill@emory.edu cloud the Cloud computing infrastructure enabled by an accessible interface to cloud computing infrastructure Enis Afgan, Dannon Baker, Nate Coraor, Anton Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte Nekrutenko, James Taylor Kateryna Makova, Anton Nekrutenko, James Taylor Enis Afgan, Hiroki Goto, Ian Paul, Francesca Chiaromonte Kateryna Makova, Anton Nekrutenko, James Taylor Bioinformatics Open Source Conference, July 9, 2010, Boston, MA missions: you are free to blog or live-blog about this presentation as long as you attribute the work to its au
  • 2. Galaxy: accessible analysis system • Easily integrate new tools • Consistent tool user interfaces automatically generated • History system facilitates and tracks multistep analyses • Exact parameters of a step can always be inspected, and easily rerun • Work ow system http://usegalaxy.org/ Enable accessible, transparent, and reproducible research
  • 3. Galaxy + Jobs Galaxy + Jobs Job Job Job Job Workstation Galaxy Galaxy Galaxy Cluster Galaxy
  • 4. Galaxy on the Cloud • Ideal for small labs and individual researchers • Labs do not have to house compute resources • Support variable volume of analysis data and computation requirements • Ready deployment with pre-con gured reference genomes and tools • Goal is to keep Galaxy use unchanged but deliver exibility and job performance improvement
  • 5. Current Status • Deployment of Galaxy on Amazon Web Services Cloud • Requires no computational expertise, no infrastructure, no software • Support for dynamic resource scaling • Support for dynamic storage • Automated con guration of the Galaxy Cloud machine image • Deploy a Galaxy cluster in minutes!
  • 6. Deploying Galaxy on the AWS Cloud 1. Create an AWS account and sign up for EC2 and S3 services 2. Use the AWS Management Console to start a master EC2 instance 3. Use the Galaxy Cloud web interface on the master instance to manage the cluster size
  • 7. 2. Start an EC2 Instance
  • 8. 3. Con gure Your Cluster
  • 9. (Starting Workers)
  • 10. 4. Grow and Shrink
  • 11. Grow Storage 1. Stop services 2. Detach volume 3. Snapshot 4. New volume 5. Grow le system 6. Resume services
  • 12. Clean Up • Once the need for a given cluster subsides, - you can always start it back up • Data is preserved while a cluster is down • Complete the shut down process by terminating the master instance from the AWS console
  • 13. What is Coming • Automatic cluster scaling - Based on workload customization • Automatic job splitting/parallelization
  • 14. Questions & Comments Try your own cluster; it takes only 5 minutes and less than $1. Complete instructions available at http://usegalaxy.org/cloud
  • 15. A Little More GC Details Persistent storage 5° Management 2° 1° Console Galaxy Image 3° 6°, 8° Galaxy Controller 4° Persistent (GC) data repository Setup services 9° 7° Galaxy Image GC-w 10° Galaxy 11° Galaxy Image GC-w Application Galaxy Image GC-w Master instance Galaxy Image GC-w Galaxy Image GC-w
  • 16. Cloud or No Cloud? Pros Cons • Consumption based • Not a silver bullet cost - cost reduction? • Expensive for 24/7 use • Better utilization of • Offers scalability in resource terms of infrastructure, • Management done applications are still by cloud provider sequential • Faster deployment • The data transfer time problem? • Dynamic scalability • Security?
  • 17. Enabling Persistence User B User A User A Cluster 1 Cluster 1 Cluster 1 Galaxy Galaxy User On terminate Data Tools Tools Public EBS snapshots Galaxy Galaxy Indices Indices Private EBS Galaxy volume User Tools User Data Data Galaxy Indices User A Cluster 2 Galaxy Tools Galaxy Indices User Data
  • 18. Enabling Versioning Private S3 bucket GC-User A, GC-default Cluster1 Public GC-User A, S3 buckets Cluster1 - latest GC used GC source - snaps IDs - latest GC-default - prev. versions GC-snaps GC-snaps GC-User A, Cluster2 GC-User A, Cluster2 Public snap IDs - latest GC used - latest - snaps IDs - prev. versions