Ntino Krampis GSC 2011

1,090 views

Published on

“Cloud BioLinux:Standardized, Pre-Configured and On-Demand
Computing for Genomics and Beyond
”. Genomics Standards Consortium Conference 2010, European Bioinformatics Institute, Hinxton, UK

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,090
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Ntino Krampis GSC 2011

  1. 1. Cloud BioLinux: Standardized, Pre-Configured and On-Demand Computing for Genomics and Beyond Ntino Krampis, PhD GSC 2011 Hinxton, UK
  2. 2. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, coordination with other centers● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  3. 3. “Bioinformatics nation is a land of city-states” Lincoln Stein● smaller labs building small-scale bioinformatics infrastructures● duplication of effort in compiling and installing software tools● some labs have no hardware, expertise, or time to install and run software● early pioneer in this area was NEBC BioLinux ( tinyurl.com/BioLinux-NEBC )● desktop linux with with 100+ pre-configured bioinformatics tools● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
  4. 4. Cloud BioLinuxstandardized, pre-configured and on-demand bioinformatics computing on the cloud ● JCVIs cloud computing expertise ● NEBCs bioinformatics software repository ● community effort – ISMB / BOSC 2010 ● standardized, pre-configured Virtual Machine (VM, image) + ● VM: emulates a computer server, encapsulates operating system, software libraries and bioinformatics tools ● Amazon EC2 computational capacity as a utility, on-demand ● rich interface through a remote desktop client =tinyurl.com/CloudBioLinux-JCVIhttp://cloudbiolinux.com
  5. 5. Cloud BioLinux and Genomic Standards framework to distribute bioinformatics tools, data and analysis results create cloud VM / images with standardized software configurations● customize Cloud BioLinux VMs, based on community requirements● share customized VMs with collaborators, avoiding effort duplication● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.) whole system snapshot exchange (Dudley and Butte 2010)● capture the state of the computing system and data● software execution parameters and “massaged” input datasets● save into cloud VM / image and share along with analysis results democratize access to computing resources● large-scale computing independently of institutional or geographic boundaries● only need a desktop computer with internet access
  6. 6. Cloud BioLinux and Genomic Standards create cloud VM / images with standard software configurations● framework to describe software components in cloud VM / image● based on python-fabric automated deployment tool● software components listed in simple text files● edit the files to mix and match software according to your community needs● community members use files to share descriptions of customized systems● start with a bare-bones VM, fabric downloads and installs specified software● Labs with sensitive data and capacity for private clouds: works identically onAmazon EC2 or Eucalyptus open-source cloudtinyurl.com/python-fabric open.eucalyptus.com
  7. 7. software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis high-level configuration describing software groups for each group individual bioinformatics tools tinyurl.com/CloudBioLinux-github
  8. 8. Cloud BioLinux and Genomic Standards whole system snapshot exchange simply signup at aws.amazon.com then aws.amazon.com/console andhttp://tinyurl.com/cloud-biolinux-tutorial
  9. 9. Cloud BioLinux and Genomic Standards whole system snapshot exchange find Cloud Biolinux using ID enter desired password for remote desktop login all other default http://tinyurl.com/cloud-biolinux-tutorial
  10. 10. free remote desktop client:nomachine.com/download.php simply enter VM IP address and your password
  11. 11. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
  12. 12. Cloud BioLinux and Genomic Standards whole system snapshot exchangeshare your analysis results: publicly or only with your collaboratorsauthorized users can access the cloud VM/image with all the software, data, analysis results
  13. 13. Cloud BioLinux and Genomic Standards whole system snapshot exchange start VM / image share perform analysis snapshot researcher Bresearcher A snapshot perform analysis share start VM / image
  14. 14. Cloud Biolinux The future● expand community, receive feedback, add more software to the VM● analysis pipelines that are used by large sequencing centers● actively seeking funding to put major effort in development● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/● tinyurl.com/cloudbiolinux-lists or community@cloudbiolinux.com
  15. 15. Acknowledgments & CreditsBrad Chapman - development of the fabric scripts and community organizerTim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentationDeepak Singh and AWS - education grant supporting ISMB / BOSC workshopJustin Johnson – community and sponsorship of cloudbiolinux.comJ. Craig Venter Inst. - time allowed to work on an open-source projectD. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovationMembers of the Cloud Biolinux community:Enis AfganMichael HeuerRichard HollandMark Jensen Thank you !Dave MessinaSteffen MöllerRoman Valls

×