Bosc2011 ntino-krampis-full

Cloud BioLinux: open source, fully-customizable
bioinformatics computing on the cloud for the
genomics community and beyond

BOSC 2011 - Vienna, Austria

Ntino Krampis, PhD
Asst. Professor
J. Craig Venter Institute (JCVI)
agbiotec@gmail.com

The community is what makes an open source project

Brad Chapman, Tim Booth, Mesude Bicak, Dawn Field, Dan Pass –
core development and planning

Enis Afgan, Pjotr Prins, Stephen Möller -
and all other members of the cloud biolinux community that move it fwd

J. Craig Venter Inst. -
time allowed to work on an open-source project

Expensive sequencing and large organizations
Commodity sequencing and small labs

●
large sequencing center, multi-million, broad-impact sequencing projects
● dedicated bioinformatics department, compute clusters

● small-factor, bench-top sequencer available: GS Junior by 454
● sequencing as a standard technique in basic biology and genetics research
● RNAseq and ChiPseq, and each biologist will be tackling a metagenome

Will small labs become the long tail of sequencing ?

amount of
sequencing Credit: WikiMedia Commons

number of labs

“Bioinformatics nation is a land of city-states” Lincoln Stein

● small labs building small-scale bioinformatics infrastructures
● duplication of effort in compiling and installing software tools
● some groups have no hardware, expertise, or time to install and run software

● NEBC BioLinux ( tinyurl.com/BioLinux-NEBC ) 100+ pre-configured tools
● example: glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS

how about large-scale sequence datasets ?

Cloud BioLinux
pre-configured and on-demand bioinformatics computing on the cloud

●
JCVI cloud computing research
● NEBC BioLinux software repository
+ ● community effort – Hackathon / BOSC 2010 - 11
● Virtual Machine (VM) on Amazon cloud

large-scale computing independently of
=
●

institutional or geographic boundaries
● only need a desktop computer with internet access

cloudbiolinux.org

simple for end-users signup at
aws.amazon.com

http://tinyurl.com/cloud-biolinux-tutorial

Amazon EC2
→
linux desktop
via remote
desktop client

What if I want to
share my
alignments with
a collaborator?

save your data as
a new VM

0.10$ / GB /
month

at 15GB, it costs
1.5$ / month

“whole system snapshot exchange” (Dudley and Butte 2010)
capture the state of the computing system and data
software execution parameters and “massaged” input datasets

Cloud BioLinux developer's framework
create cloud VM / images with standardized software configurations

● customize Cloud BioLinux based on community requirements

● mix and match software from NEBC or other (DebianMed, Scientific Linux etc.)

● share customized VMs with collaborators, avoiding effort duplication

● deploy Cloud BioLinux on private and local clouds

software domains in bioinformatics: nextgen
sequencing, de novo assembly, annotation, phylogeny,
molecular structures, gene expression analysis

github.com/chapmanb/cloudbiolinux

Cloud BioLinux developer's framework

● based on python-fabric auto-deployment tool

● software components listed in plain text files

● collaborators use files to share descriptions of cloud VM / images

● start with a bare-bones VM / image

● fabric downloads and installs specified software

tinyurl.com/python-fabric

Cloud Biolinux
The future

● groups.google.com/cloudbiolinux and cloudbiolinux.org

● expand community, receive feedback, add more software to the VM

● scalable computing: SGE (Galaxy Cloudman), Hadoop (cloudgene.uibk.ac.at)

● add next-gen sequencing pipelines, NIH funding - adds effort in development

● We just had a 2-day codefest at the MetaLab, http://metalab.at/

and before I finish
this talk....

Bosc2011 ntino-krampis-full

More Related Content

Viewers also liked

Similar to Bosc2011 ntino-krampis-full

More from Bioinformatics Open Source Conference

Recently uploaded

Bosc2011 ntino-krampis-full