Your SlideShare is downloading. ×
0
Cloud BioLinux: Pre-configured Bioinformatics                    Computing for the Genomics Community                      ...
J. Craig Venter Institute ( JCVI )                •     Human Microbiome                      Project (Nelson et al. Scien...
J. Craig Venter Institute            •     Global Ocean Survey                  (first publication, Venter et al.          ...
Big Data and sequencing     •     JCVI sequencing facility:           454, Solexa, HiSeq, and           IonTorrent on the ...
JCVI: sequencing and computing                                      infrastructure                •         “big” sequenci...
A new paradigm:                          Low-cost, bench-top sequencers              •      GS Junior - 454, MiSeq -Illumi...
Will smaller academic labs become the                            long tail of sequencing ?                            “seq...
Sequencers shipped without clusters                    •     Problem A : sequence                          analysis requir...
Each lab builds a cluster ?                    •     need additional funds to                          buy the hardware   ...
Centralized bioinformatics services                    •     Bioinformatic Resource                          Centers ex. G...
Problem A : sequence analysis requires                            computational capacity                   •      Amazon E...
Cloud Computing and Virtualization                    •     OS, software and data,                          pre-installed ...
Problem B: bioinformatics tools need                         software engineering expertise            •     VM with pre-i...
Solving Problems A & B :                                       Cloud BioLinux                    •     Cloud BioLinux: pub...
Accessing Cloud BioLinux                           http://aws.amazon.com/consoleTuesday, November 6, 12
Launch through the EC2 cloud consoleTuesday, November 6, 12
Amazon EC2 VM launch wizard                                       cloudbiolinux.orgTuesday, November 6, 12
Tuesday, November 6, 12
Cloud BioLinux desktop                              remote connection        tinyurl.com/bootcloud1   tinyurl.com/bootclou...
Cloud BioLinux desktopTuesday, November 6, 12
Cloud BioLinux desktopTuesday, November 6, 12
Data exchange on the cloud                                VM snapshotsTuesday, November 6, 12
Cloud computing research at JCVI                    •     open-source cloud                          platforms, fully comp...
Scriptable Cloud Infrastructures                                  Fabric                               framework     •   C...
Scriptable Cloud Infrastructures              •      Python Fabric leverages                     Linux packages (APTitude ...
Scalable Data Analysis            •     Cloud BioLinux + Cloudman            •     dual role : Master / Worker            ...
Goodies with Cloud BioLinuxTuesday, November 6, 12
Goodies with Cloud BioLinuxTuesday, November 6, 12
From sequencer to the cloud                                                 credit:                                       ...
Acknowledgments                    •     Cloud BioLinux community:           cloudbiolinux.org                          Br...
Upcoming SlideShare
Loading in...5
×

Ntino Cloud BioLinux Barcelona Spain 2012

292

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
292
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Ntino Cloud BioLinux Barcelona Spain 2012"

  1. 1. Cloud BioLinux: Pre-configured Bioinformatics Computing for the Genomics Community Ntino Krampis Asst. Professor - Informatics J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/Tuesday, November 6, 12
  2. 2. J. Craig Venter Institute ( JCVI ) • Human Microbiome Project (Nelson et al. Science 2010; 328: 994–99) • NIH funded, launched in 2008, $115 million • metagenomic sequencing of microbial genomes from the human body • sequence everything in sample, use informatics to separate genomesTuesday, November 6, 12
  3. 3. J. Craig Venter Institute • Global Ocean Survey (first publication, Venter et al. Science 2004; 304: 66-74) • metagenomic sequencing of microbes from oceans around the world • Darwin’s route ? • Numbers: HMP > 2 mil. new proteins, GOS > 1.2Tuesday, November 6, 12
  4. 4. Big Data and sequencing • JCVI sequencing facility: 454, Solexa, HiSeq, and IonTorrent on the way • Processed data: size information content • But... look at SOLiD 3 Source: http://www.politigenomics.com/next-generation- sequencing-informaticsTuesday, November 6, 12
  5. 5. JCVI: sequencing and computing infrastructure • “big” sequencing needs large-scale informatics • ~1000 node Grid Engine cluster • research with Hadoop / MapRecuce, and a small private cloud • 50+ bioinformaticians and software developersTuesday, November 6, 12
  6. 6. A new paradigm: Low-cost, bench-top sequencers • GS Junior - 454, MiSeq -Illumina • complete sequencing of bacterial, viral, fungal genomes • RNAseq (gene expression), ChiPseq (protein interactions), gene variant discovery • sequencing as a standard technique in basic genetics research - like PCR ?Tuesday, November 6, 12
  7. 7. Will smaller academic labs become the long tail of sequencing ? “sequencing factories” : JCVI, Broad Inst. Washington Univ. Amount Inst. of Genome Sciences of small academic labs with sequencing bench-top sequencers Number of labsTuesday, November 6, 12
  8. 8. Sequencers shipped without clusters • Problem A : sequence analysis requires computational capacity • genome assembly, BLAST, gene finders - annotation • Problem B: bioinformatics ??? tools need software engineering expertise • unix/linux operating systems, maintaining software libraries, compiling source codeTuesday, November 6, 12
  9. 9. Each lab builds a cluster ? • need additional funds to buy the hardware • funds for personnel to maintain the cluster and software • duplication of effort across labs • sub-optimal utilization of the hardwareTuesday, November 6, 12
  10. 10. Centralized bioinformatics services • Bioinformatic Resource Centers ex. GSCID • bioinformatic services usually coupled with sequencing of a genome • provide mostly data access to external PIs • cannot support to every lab with a sequencerTuesday, November 6, 12
  11. 11. Problem A : sequence analysis requires computational capacity • Amazon Elastic Compute Cloud (EC2), pay-by-the- hour computing • cloud servers cost $0.085 - $2 per hour • max capacity 64GB RAM / 8 CPU (can boot hundreds of servers) World-wide data centers 750 hours free for new users: aws.amazon.com/free/ free compute for teaching: aws.amazon.com/grants/Tuesday, November 6, 12
  12. 12. Cloud Computing and Virtualization • OS, software and data, pre-installed in Virtual Machine (VM) • cloud provider: hardware and virtualization layer • VM is a full-featured server in a single file • VM transfer on private cloud Credit: VMware Inc.Tuesday, November 6, 12
  13. 13. Problem B: bioinformatics tools need software engineering expertise • VM with pre-installed software on the cloud • avoid compiling source code, or other software dependencies • rent computational capacity, on a pay as you go basis • run the VM on the closest Amazon data centerTuesday, November 6, 12
  14. 14. Solving Problems A & B : Cloud BioLinux • Cloud BioLinux: publicly accessible VM on EC2 • 100+ pre-installed bioinformatics tools • remote desktop for non- command line experts • you can create a cluster with Cloud BioLinux - CloudMan Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinformatics. 2012 Mar 19; 13: 42.Tuesday, November 6, 12
  15. 15. Accessing Cloud BioLinux http://aws.amazon.com/consoleTuesday, November 6, 12
  16. 16. Launch through the EC2 cloud consoleTuesday, November 6, 12
  17. 17. Amazon EC2 VM launch wizard cloudbiolinux.orgTuesday, November 6, 12
  18. 18. Tuesday, November 6, 12
  19. 19. Cloud BioLinux desktop remote connection tinyurl.com/bootcloud1 tinyurl.com/bootcloud2Tuesday, November 6, 12
  20. 20. Cloud BioLinux desktopTuesday, November 6, 12
  21. 21. Cloud BioLinux desktopTuesday, November 6, 12
  22. 22. Data exchange on the cloud VM snapshotsTuesday, November 6, 12
  23. 23. Cloud computing research at JCVI • open-source cloud platforms, fully compatible with Amazon EC2 • active funding, NIAID viral genomics pipeline on cloud • end-to-end, sequence to assembly, annotation, visualization via Galaxy • run on Amazon, private cloud, or desktopTuesday, November 6, 12
  24. 24. Scriptable Cloud Infrastructures Fabric framework • Cloud BioLinux VM configuration in plain text • high-level configuration, software groups • each group individual bioinformatics toolsTuesday, November 6, 12
  25. 25. Scriptable Cloud Infrastructures • Python Fabric leverages Linux packages (APTitude repositories) • mix and match software from repositories • share VM configuration as source code • clone across clouds Krampis K, Booth T, Chapman B, Tiwari B, Bicak M, Field D, Nelson K Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community. BMC Bioinformatics. 2012 Mar 19; 13: 42.Tuesday, November 6, 12
  26. 26. Scalable Data Analysis • Cloud BioLinux + Cloudman • dual role : Master / Worker • Cloud BioLinux VM, has Cloudman scripts that start more copies of itself • Grid Engine (SGE) cluster • http://usecloudman.org/ Afgan, E., Chapman, B. et al. (2012). Using Cloud Computing Infrastructure with CloudBioLinux, CloudMan, and Galaxy.Current Protocols in Bioinformatics, 11-9.Tuesday, November 6, 12
  27. 27. Goodies with Cloud BioLinuxTuesday, November 6, 12
  28. 28. Goodies with Cloud BioLinuxTuesday, November 6, 12
  29. 29. From sequencer to the cloud credit: basespace.illumina.comTuesday, November 6, 12
  30. 30. Acknowledgments • Cloud BioLinux community: cloudbiolinux.org Brad Chapman, Enis Afgan,Tim Booth, Mesude Bicak, Dawn Field groups.google.com/group/cloudbiolinux • JCVI collaborators: Alex Richter, tinyurl.com/cloudboot1 Ravi Sanka, Andrey Tovichgrechko, Johannes Goll, Karen Nelson, Bill tinyurl.com/cloudboot2 Nierman, JCVI IT support. kkrampis@jcvi.org • NIAID and for funding: Maria Giovani, Punam Mathur slideshare.com/agbiotec Thank you !Tuesday, November 6, 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×