Bosc2011 ntino-krampis-full


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bosc2011 ntino-krampis-full

  1. 1. Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI)
  2. 2. The community is what makes an open source projectBrad Chapman, Tim Booth, Mesude Bicak, Dawn Field, Dan Pass –core development and planningEnis Afgan, Pjotr Prins, Stephen Möller -and all other members of the cloud biolinux community that move it fwdJ. Craig Venter Inst. -time allowed to work on an open-source project
  3. 3. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, compute clusters● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  4. 4. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs
  5. 5. “Bioinformatics nation is a land of city-states” Lincoln Stein● small labs building small-scale bioinformatics infrastructures● duplication of effort in compiling and installing software tools● some groups have no hardware, expertise, or time to install and run software● NEBC BioLinux ( ) 100+ pre-configured tools● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
  6. 6. Cloud BioLinux pre-configured and on-demand bioinformatics computing on the cloud ● JCVI cloud computing research ● NEBC BioLinux software repository + ● community effort – Hackathon / BOSC 2010 - 11 ● Virtual Machine (VM) on Amazon cloud large-scale computing independently of = ● institutional or geographic boundaries ● only need a desktop computer with internet
  7. 7. simple for end-users signup at
  8. 8. Amazon EC2→linux desktopvia remotedesktop client
  9. 9. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
  10. 10. “whole system snapshot exchange” (Dudley and Butte 2010)capture the state of the computing system and datasoftware execution parameters and “massaged” input datasets
  11. 11. Cloud BioLinux developers framework create cloud VM / images with standardized software configurations● customize Cloud BioLinux based on community requirements● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)● share customized VMs with collaborators, avoiding effort duplication● deploy Cloud BioLinux on private and local clouds
  12. 12. software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis
  13. 13. Cloud BioLinux developers framework ● based on python-fabric auto-deployment tool ● software components listed in plain text files ● collaborators use files to share descriptions of cloud VM / images ● start with a bare-bones VM / image ● fabric downloads and installs specified
  14. 14. Cloud Biolinux The future● and● expand community, receive feedback, add more software to the VM● scalable computing: SGE (Galaxy Cloudman), Hadoop (● add next-gen sequencing pipelines, NIH funding - adds effort in development● We just had a 2-day codefest at the MetaLab,
  15. 15. and before I finishthis talk....
  16. 16. Thank you !