Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chi next gen-ntino-krampis


Published on

  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Chi next gen-ntino-krampis

  1. 1. Cloud BioLinux: Pre-Configured and On-Demand High Performance Computing for the Genomics Community Ntino Krampis, PhD Next-Gen Sequence Data Management '10 Providence, RI
  2. 2. Expensive sequencing, computing and large organizations ● multi-million, broad-impact sequencing projects ● large sequencing center, with a dedicated bioinformatics department ● large-scale computations on SGE cluster, algorithm acceleration hardware
  3. 3. Bench-top, commodity sequencing and small labs ● small-factor sequencer available: GS Junior by 454 ● sequencing as a standard technique in basic biology and genetics research ● remember microarrays and lengthy assays for protein interactions ? ● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  4. 4. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs ● downstream bioinformatic analysis required for biological discovery ● basic analysis example: large-scale BLAST to public DBs (try 0.5GB at NCBI) ● do not have the hardware, expertise, or time to install and run software locally
  5. 5. Cloud Biolinux pre-configured and on-demand bioinformatics on the cloud ● a public virtual machine (VM) on EC2 with 100+ bioinformatics tools ● how it came to be, what offers for sequence analysis ● where and how do I run it, especially if I am not a computer expert ● modifying and sharing VM configurations and data with your peers ● openness and community around Cloud Biolinux
  6. 6. Cloud Biolinux The Biolinux part ● an Ubuntu Linux desktop for bioinformatics ● NEBC packaged software and maintains repository + ● Ubuntu AMI on EC2, pull packages from repository ● additional software of interest to JCVI =
  7. 7. Cloud Biolinux what comes in the box ● glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS ● mpiBLAST clusters using EC2 virtual machine instances ● Celera whole genome shotgun assembler ● NX remote desktop, easy to use for benchtop scientists
  8. 8. Cloud Biolinux The Cloud part ● find our VM on Amazon EC2: Biolinux 5.0 packages (32-bit): ami-6953b200 Biolinux 6.0 packages (64-bit): ami-6011e409 , EBS based ● 17GB / 6 core instances 0.5$ / hour, see ● a small bacterial genome assembly costs a little over 2$ ● up to 68 RAM / 26 core, EBS up to 1000 GB in size (0.10$ / GB / month) ● make a copy of our public Biolinux ami - add your data - make private
  9. 9. Cloud Biolinux (credit to the NEBC team) simply signup at then and
  10. 10. Cloud Biolinux (credit to the NEBC team) ●find Cloud Biolinux AMI using ID ● enter desired password for remote desktop login ● all other default
  11. 11. ●get remote desktop client: ●simply enter VM's IP address and your password
  12. 12. What if I want to share my alignments with a collaborator? save your data as a new AMI EBS cost 0.10$ / GB / month at 15GB, it costs 1.5$ / month
  13. 13. share your data: public or with another AWS user users with access can boot the AMI with all the software + data
  14. 14. Cloud Biolinux The Cloud part ● run Cloud Biolinux on your private cloud ? ● Eucalyptus open source cloud platform ● identical API with EC2, without the usage charges ● easy to set up on your lab's cluster, comes with Ubuntu server (UEC) ● download VMs from Sourceforge ( )
  15. 15. Cloud Biolinux ● porting VMs across cloud platforms is not trivial ● Cloud Biolinux VMs from EC2 to Eucalyptus, Xen kernel and boot sector ● framework to share VM configurations ( ) ● based on python-fabric automated deployment tool ● simply edit the software list files and share with collaborators ● they start with fresh VM, python-fabric replicates VM setup on their cloud
  16. 16. Cloud Biolinux Collaboration and open source high-level configuration describing software groups for each group individual software packages simply edit the files to change the VM configuration ...............
  17. 17. Cloud Biolinux The community ● from JCVI and NEBC to an open-source, community-based project ● community initiated during tele-conference meeting at SC '10, Portland, OR ● first meeting past July in Boston, ● work done: 64-bit AMIs, NX remote desktop, set-up the fabric framework ● next year's at ISMB/BOSC in Vienna, Austria ● and most important,
  18. 18. Cloud Biolinux The future ● expand community, receive feedback, add more software to the VM ● genome assemblers, high-memory EC2 instances up to 68GB RAM ● Hadoop / MapReduce (for those running the VM in private clouds) ● analysis pipelines that are used by large sequencing centers ● actively seeking funding to put major effort in development ● or
  19. 19. Acknowledgments & Credits Brad Chapman - development of the fabric scripts and community organizer Tim Booth, Bela Tiwari – BioLinux 6.0 development and EC2 documentation Deepak Singh and AWS - education grant supporting codefest workshop Justin Johnson – community and sponsorship of J. Craig Venter Inst. - time allowed to work on an open-source project D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation Members of the Cloud Biolinux community: Enis Afgan Michael Heuer Richard Holland Mark Jensen Thank you ! Dave Messina Steffen Möller Roman Valls