Your SlideShare is downloading. ×
0
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

F02-Cloud-Cloud BioLinux

650

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
650
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cloud BioLinux: open source, fully-customizable bioinformatics computing on the cloud for the genomics community and beyond BOSC 2011 - Vienna, Austria Ntino Krampis, PhD Asst. Professor J. Craig Venter Institute (JCVI) agbiotec@gmail.com
  • 2. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, large Sun Grid Engine cluster● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  • 3. Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs
  • 4. “Bioinformatics nation is a land of city-states” Lincoln Stein● small labs building small-scale bioinformatics infrastructures● duplication of effort in compiling and installing software tools● some labs have no hardware, expertise, or time to install and run software● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS how about large-scale sequence datasets ?
  • 5. Cloud BioLinux pre-configured and on-demand bioinformatics computing on the cloud ● JCVI cloud computing research ● NEBC bioinformatics software repository + ● community effort – Hackathon / BOSC 2010 - 11 ● pre-configured Virtual Machine (VM, image) ● large-scale computing independently of institutional or geographic boundaries = ● only need a desktop computer with internet accesscloudbiolinux.org
  • 6. Cloud BioLinux simple for end-users signup at aws.amazon.com then aws.amazon.com/console andhttp://tinyurl.com/cloud-biolinux-tutorial
  • 7. Amazon EC2→linux desktopvia remotedesktop client
  • 8. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
  • 9. “whole system snapshot exchange” (Dudley and Butte 2010)capture the state of the computing system and datasoftware execution parameters and “massaged” input datasets
  • 10. Cloud BioLinux developers framework create cloud VM / images with standardized software configurations● customize Cloud BioLinux based on community requirements● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)● share customized VMs with collaborators, avoiding effort duplication● deploy Cloud BioLinux on private and local clouds
  • 11. Cloud BioLinux developers framework ● based on python-fabric auto-deployment tool ● software components listed in plain text files ● collaborators use files to share descriptions of cloud VM / images ● start with a bare-bones VM / image ● fabric downloads and installs specified softwaretinyurl.com/python-fabric open.eucalyptus.com
  • 12. software domains in bioinformatics: nextgensequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis github.com/chapmanb/cloudbiolinux
  • 13. Cloud Biolinux The future● expand community, receive feedback, add more software to the VM● groups.google.com/cloudbiolinux and cloudbiolinux.org● add data analysis pipelines that are used by sequencing centers● actively seeking funding to put major effort in development● 2011 ISMB/BOSC in Vienna, Austria, http://metalab.at/●
  • 14. Acknowledgments & CreditsBrad Chapman - development of the fabric scripts and community organizerTim Booth, Mesude Bicak, Dawn Field, Bela Tiwari – BioLinux 6.0J. Craig Venter Inst. - time allowed to work on an open-source projectD. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovationDeepak Singh and AWS - education grant supporting ISMB / BOSC workshopMembers of the Cloud Biolinux community – precious development time Thank you !

×