Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the       genomics community and...
Expensive sequencing and large organizations                   Commodity sequencing and small labs●    large sequencing ce...
Will small labs become the long tail of sequencing ?   amount of   sequencing         Credit: WikiMedia Commons           ...
“Bioinformatics nation is a land of city-states” Lincoln Stein●   small labs building small-scale bioinformatics infrastru...
Cloud BioLinux      pre-configured and on-demand bioinformatics computing on the cloud                        ●   JCVI clo...
Cloud BioLinux                 simple for end-users                                                    signup at          ...
Amazon EC2→linux desktopvia remotedesktop client
What if I want to    share myalignments witha collaborator?save your data as   a new VM  0.10$ / GB /     monthat 15GB, it...
“whole system snapshot exchange” (Dudley and Butte 2010)capture the state of the computing system and datasoftware executi...
Cloud BioLinux developers framework        create cloud VM / images with standardized software configurations●   customize...
Cloud BioLinux developers framework     ●   based on python-fabric auto-deployment tool     ●   software components listed...
software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny,    molecular structures, g...
Cloud Biolinux                                  The future●   expand community, receive feedback, add more software to the...
Acknowledgments & CreditsBrad Chapman     - development of the fabric scripts and community organizerTim Booth, Mesude Bic...
Upcoming SlideShare
Loading in …5
×

F02-Cloud-Cloud BioLinux

856 views
723 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
856
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

F02-Cloud-Cloud BioLinux

  1. 1. Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI) agbiotec@gmail.com
  2. 2. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, large Sun Grid Engine cluster● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  3. 3. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs
  4. 4. “Bioinformatics nation is a land of city-states” Lincoln Stein● small labs building small-scale bioinformatics infrastructures● duplication of effort in compiling and installing software tools● some labs have no hardware, expertise, or time to install and run software● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
  5. 5. Cloud BioLinux pre-configured and on-demand bioinformatics computing on the cloud ● JCVI cloud computing research ● NEBC bioinformatics software repository + ● community effort – Hackathon / BOSC 2010 - 11 ● pre-configured Virtual Machine (VM, image) ● large-scale computing independently of institutional or geographic boundaries = ● only need a desktop computer with internet accesscloudbiolinux.org
  6. 6. Cloud BioLinux simple for end-users signup at aws.amazon.com then aws.amazon.com/console andhttp://tinyurl.com/cloud-biolinux-tutorial
  7. 7. Amazon EC2→linux desktopvia remotedesktop client
  8. 8. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
  9. 9. “whole system snapshot exchange” (Dudley and Butte 2010)capture the state of the computing system and datasoftware execution parameters and “massaged” input datasets
  10. 10. Cloud BioLinux developers framework create cloud VM / images with standardized software configurations● customize Cloud BioLinux based on community requirements● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)● share customized VMs with collaborators, avoiding effort duplication● deploy Cloud BioLinux on private and local clouds
  11. 11. Cloud BioLinux developers framework ● based on python-fabric auto-deployment tool ● software components listed in plain text files ● collaborators use files to share descriptions of cloud VM / images ● start with a bare-bones VM / image ● fabric downloads and installs specified softwaretinyurl.com/python-fabric open.eucalyptus.com
  12. 12. software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis github.com/chapmanb/cloudbiolinux
  13. 13. Cloud Biolinux The future● expand community, receive feedback, add more software to the VM● groups.google.com/cloudbiolinux and cloudbiolinux.org● add data analysis pipelines that are used by sequencing centers● actively seeking funding to put major effort in development● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/●
  14. 14. Acknowledgments & CreditsBrad Chapman - development of the fabric scripts and community organizerTim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0J. Craig Venter Inst. - time allowed to work on an open-source projectD. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovationDeepak Singh and AWS - education grant supporting ISMB / BOSC workshopMembers of the Cloud Biolinux community – precious development time Thank you !

×