• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Chi next gen-ntino-krampis
 

Chi next gen-ntino-krampis

on

  • 1,688 views

 

Statistics

Views

Total Views
1,688
Views on SlideShare
1,686
Embed Views
2

Actions

Likes
1
Downloads
15
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Chi next gen-ntino-krampis Chi next gen-ntino-krampis Presentation Transcript

    • Cloud BioLinux: Pre-Configured and On-Demand High Performance Computing for the Genomics Community Ntino Krampis, PhD Next-Gen Sequence Data Management '10 Providence, RI
    • Expensive sequencing, computing and large organizations ● multi-million, broad-impact sequencing projects ● large sequencing center, with a dedicated bioinformatics department ● large-scale computations on SGE cluster, algorithm acceleration hardware
    • Bench-top, commodity sequencing and small labs ● small-factor sequencer available: GS Junior by 454 ● sequencing as a standard technique in basic biology and genetics research ● remember microarrays and lengthy assays for protein interactions ? ● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
    • Will small labs become the long tail of sequencing ? amount of sequencing Credit: WikiMedia Commons number of labs ● downstream bioinformatic analysis required for biological discovery ● basic analysis example: large-scale BLAST to public DBs (try 0.5GB at NCBI) ● do not have the hardware, expertise, or time to install and run software locally
    • Cloud Biolinux pre-configured and on-demand bioinformatics on the cloud ● a public virtual machine (VM) on EC2 with 100+ bioinformatics tools ● how it came to be, what offers for sequence analysis ● where and how do I run it, especially if I am not a computer expert ● modifying and sharing VM configurations and data with your peers ● openness and community around Cloud Biolinux
    • Cloud Biolinux The Biolinux part ● an Ubuntu Linux desktop for bioinformatics tinyurl.com/BioLinux-NEBC ● NEBC packaged software and maintains repository + ● Ubuntu AMI on EC2, pull packages from repository ● additional software of interest to JCVI = tinyurl.com/CloudBioLinux-JCVI
    • Cloud Biolinux what comes in the box ● glimmer, hmmer, phylip, rasmol, genespring, clustalw, EMBOSS ● mpiBLAST clusters using EC2 virtual machine instances ● Celera whole genome shotgun assembler ● NX remote desktop, easy to use for benchtop scientists
    • Cloud Biolinux The Cloud part ● find our VM on Amazon EC2: Biolinux 5.0 packages (32-bit): ami-6953b200 Biolinux 6.0 packages (64-bit): ami-6011e409 , EBS based ● 17GB / 6 core instances 0.5$ / hour, see aws.amazon.com/ec2/pricing ● a small bacterial genome assembly costs a little over 2$ ● up to 68 RAM / 26 core, EBS up to 1000 GB in size (0.10$ / GB / month) ● make a copy of our public Biolinux ami - add your data - make private
    • Cloud Biolinux http://tinyurl.com/cloud-biolinux-tutorial (credit to the NEBC team) simply signup at aws.amazon.com then aws.amazon.com/console and
    • Cloud Biolinux http://tinyurl.com/cloud-biolinux-tutorial (credit to the NEBC team) ●find Cloud Biolinux AMI using ID ● enter desired password for remote desktop login ● all other default
    • ●get remote desktop client: nomachine.com/download.php ●simply enter VM's IP address and your password
    • What if I want to share my alignments with a collaborator? save your data as a new AMI EBS cost 0.10$ / GB / month at 15GB, it costs 1.5$ / month
    • share your data: public or with another AWS user users with access can boot the AMI with all the software + data
    • Cloud Biolinux The Cloud part ● run Cloud Biolinux on your private cloud ? ● Eucalyptus open source cloud platform ● identical API with EC2, without the usage charges ● easy to set up on your lab's cluster, comes with Ubuntu server (UEC) ● download VMs from Sourceforge ( tinyurl.com/CloudBiolinux-SF ) open.eucalyptus.com
    • Cloud Biolinux ● porting VMs across cloud platforms is not trivial ● Cloud Biolinux VMs from EC2 to Eucalyptus, Xen kernel and boot sector ● framework to share VM configurations ( tinyurl.com/bootstrap-cloudbiolinux ) ● based on python-fabric automated deployment tool ● simply edit the software list files and share with collaborators ● they start with fresh VM, python-fabric replicates VM setup on their cloud tinyurl.com/python-fabric
    • Cloud Biolinux Collaboration and open source high-level configuration describing software groups for each group individual software packages simply edit the files to change the VM configuration tinyurl.com/CloudBioLinux-github ...............
    • Cloud Biolinux The community ● from JCVI and NEBC to an open-source, community-based project ● community initiated during tele-conference meeting at SC '10, Portland, OR ● first meeting past July in Boston, tinyurl.com/openbio-codefest-2010 ● work done: 64-bit AMIs, NX remote desktop, set-up the fabric framework ● next year's at ISMB/BOSC in Vienna, Austria http://metalab.at/ ● cloudbiolinux.com and most important, tinyurl.com/cloudbiolinux-lists
    • Cloud Biolinux The future ● expand community, receive feedback, add more software to the VM ● genome assemblers, high-memory EC2 instances up to 68GB RAM ● Hadoop / MapReduce (for those running the VM in private clouds) ● analysis pipelines that are used by large sequencing centers ● actively seeking funding to put major effort in development ● tinyurl.com/cloudbiolinux-lists or community@cloudbiolinux.com
    • Acknowledgments & Credits Brad Chapman - development of the fabric scripts and community organizer Tim Booth, Bela Tiwari – BioLinux 6.0 development and EC2 documentation Deepak Singh and AWS - education grant supporting codefest workshop Justin Johnson – community and sponsorship of cloudbiolinux.com J. Craig Venter Inst. - time allowed to work on an open-source project D. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovation Members of the Cloud Biolinux community: Enis Afgan Michael Heuer Richard Holland Mark Jensen Thank you ! Dave Messina Steffen Möller Roman Valls