1. Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI) email@example.com
2. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, large Sun Grid Engine cluster● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
3. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs
4. “Bioinformatics nation is a land of city-states” Lincoln Stein● small labs building small-scale bioinformatics infrastructures● duplication of effort in compiling and installing software tools● some labs have no hardware, expertise, or time to install and run software● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
5. Cloud BioLinux pre-configured and on-demand bioinformatics computing on the cloud ● JCVI cloud computing research ● NEBC bioinformatics software repository + ● community effort – Hackathon / BOSC 2010 - 11 ● pre-configured Virtual Machine (VM, image) ● large-scale computing independently of institutional or geographic boundaries = ● only need a desktop computer with internet accesscloudbiolinux.org
6. Cloud BioLinux simple for end-users signup at aws.amazon.com then aws.amazon.com/console andhttp://tinyurl.com/cloud-biolinux-tutorial
8. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
9. “whole system snapshot exchange” (Dudley and Butte 2010)capture the state of the computing system and datasoftware execution parameters and “massaged” input datasets
10. Cloud BioLinux developers framework create cloud VM / images with standardized software configurations● customize Cloud BioLinux based on community requirements● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)● share customized VMs with collaborators, avoiding effort duplication● deploy Cloud BioLinux on private and local clouds
11. Cloud BioLinux developers framework ● based on python-fabric auto-deployment tool ● software components listed in plain text files ● collaborators use files to share descriptions of cloud VM / images ● start with a bare-bones VM / image ● fabric downloads and installs specified softwaretinyurl.com/python-fabric open.eucalyptus.com
12. software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis github.com/chapmanb/cloudbiolinux
13. Cloud Biolinux The future● expand community, receive feedback, add more software to the VM● groups.google.com/cloudbiolinux and cloudbiolinux.org● add data analysis pipelines that are used by sequencing centers● actively seeking funding to put major effort in development● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/●
14. Acknowledgments & CreditsBrad Chapman - development of the fabric scripts and community organizerTim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0J. Craig Venter Inst. - time allowed to work on an open-source projectD. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovationDeepak Singh and AWS - education grant supporting ISMB / BOSC workshopMembers of the Cloud Biolinux community – precious development time Thank you !