Chi next gen-ntino-krampis


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Chi next gen-ntino-krampis

  1. 1. Cloud BioLinux: Pre-Configured and On-Demand High Performance Computing for the Genomics Community Ntino Krampis, PhD Next-Gen Sequence Data Management '10 Providence, RI
  2. 2. Expensive sequencing, computing and large organizations ● multi-million, broad-impact sequencing projects ● large sequencing center, with a dedicated bioinformatics department ● large-scale computations on SGE cluster, algorithm acceleration hardware
  3. 3. Bench-top, commodity sequencing and small labs ● small-factor sequencer available: GS Junior by 454 ● sequencing as a standard technique in basic biology and genetics research ● remember microarrays and lengthy assays for protein interactions ? ● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  4. 4. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs ● downstream bioinformatic analysis required for biological discovery ● basic analysis example: large-scale BLAST to public DBs (try 0.5GB at NCBI) ● do not have the hardware, expertise, or time to install and run software locally
  5. 5. Cloud Biolinux pre-configured and on-demand bioinformatics on the cloud ● a public virtual machine (VM) on EC2 with 100+ bioinformatics tools ● how it came to be, what offers for sequence analysis ● where and how do I run it, especially if I am not a computer expert ● modifying and sharing VM configurations and data with your peers ● openness and community around Cloud Biolinux
  6. 6. Cloud Biolinux The Biolinux part ● an Ubuntu Linux desktop for bioinformatics ● NEBC packaged software and maintains repository + ● Ubuntu AMI on EC2, pull packages from repository ● additional software of interest to JCVI =
  7. 7. Cloud Biolinux what comes in the box ● glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS ● mpiBLAST clusters using EC2 virtual machine instances ● Celera whole genome shotgun assembler ● NX remote desktop, easy to use for benchtop scientists
  8. 8. Cloud Biolinux The Cloud part ● find our VM on Amazon EC2: Biolinux 5.0 packages (32-bit): ami-6953b200 Biolinux 6.0 packages (64-bit): ami-6011e409 , EBS based ● 17GB / 6 core instances 0.5$ / hour, see ● a small bacterial genome assembly costs a little over 2$ ● up to 68 RAM / 26 core, EBS up to 1000 GB in size (0.10$ / GB / month) ● make a copy of our public Biolinux ami - add your data - make private
  9. 9. Cloud Biolinux (credit to the NEBC team) simply signup at then and
  10. 10. Cloud Biolinux (credit to the NEBC team) ●find Cloud Biolinux AMI using ID ● enter desired password for remote desktop login ● all other default
  11. 11. ●get remote desktop client: ●simply enter VM's IP address and your password
  12. 12. What if I want to share my alignments with a collaborator? save your data as a new AMI EBS cost 0.10$ / GB / month at 15GB, it costs 1.5$ / month
  13. 13. share your data: public or with another AWS user users with access can boot the AMI with all the software + data
  14. 14. Cloud Biolinux The Cloud part ● run Cloud Biolinux on your private cloud ? ● Eucalyptus open source cloud platform ● identical API with EC2, without the usage charges ● easy to set up on your lab's cluster, comes with Ubuntu server (UEC) ● download VMs from Sourceforge ( )
  15. 15. Cloud Biolinux ● porting VMs across cloud platforms is not trivial ● Cloud Biolinux VMs from EC2 to Eucalyptus, Xen kernel and boot sector ● framework to share VM configurations ( ) ● based on python-fabric automated deployment tool ● simply edit the software list files and share with collaborators ● they start with fresh VM, python-fabric replicates VM setup on their cloud
  16. 16. Cloud Biolinux Collaboration and open source high-level configuration describing software groups for each group individual software packages simply edit the files to change the VM configuration ...............
  17. 17. Cloud Biolinux The community ● from JCVI and NEBC to an open-source, community-based project ● community initiated during tele-conference meeting at SC '10, Portland, OR ● first meeting past July in Boston, ● work done: 64-bit AMIs, NX remote desktop, set-up the fabric framework ● next year's at ISMB/BOSC in Vienna, Austria ● and most important,
  18. 18. Cloud Biolinux The future ● expand community, receive feedback, add more software to the VM ● genome assemblers, high-memory EC2 instances up to 68GB RAM ● Hadoop / MapReduce (for those running the VM in private clouds) ● analysis pipelines that are used by large sequencing centers ● actively seeking funding to put major effort in development ● or
  19. 19. Acknowledgments & Credits Brad Chapman - development of the fabric scripts and community organizer Tim Booth, Bela Tiwari – BioLinux 6.0 development and EC2 documentation Deepak Singh and AWS - education grant supporting codefest workshop Justin Johnson – community and sponsorship of J. Craig Venter Inst. - time allowed to work on an open-source project D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation Members of the Cloud Biolinux community: Enis Afgan Michael Heuer Richard Holland Mark Jensen Thank you ! Dave Messina Steffen Möller Roman Valls