0
Developing an open source
  community for cloud
      bioinformatics
        Brad Chapman
  http://bcbio.wordpress.com/


...
Overview

   1   Building open source bioinformatics
       communities is hard.
   2   Developer resources are a producti...
Motivation

    Open source
        OpenBio, Biopython
        Graduate school – developed distributed
        algorithm. ...
Filters in biological computing

          Working in same biological area

          Interest in developing open source c...
Successful bioinformatics

  Sean Eddy, HMMER
  ...the best software in the field is often an
  unplanned labor of love fro...
Recognizing contributions
Successful community projects

      OpenBio: BioPerl, Biopython, BioJava
      Bioconductor
 Common theme
 Aimed at devel...
Lowering activation energy
Establishing common platform

                           The solution
                    =      to all our
              ...
Existing cloud bioinformatics work

      JCVI Cloud BioLinux
      bioperl-max
      MachetEC2
      Debian Med
  Overlap...
Integrated community solution

      Inclusive but configurable
      Easy to contribute
      Automated
 Bootstrap bare ma...
Inclusive but configurable
  # Top level YAML configuration file specifying
  # groups of programs to be installed.
  packa...
Easy to contribute
 # Configuration file defining R specific libraries that
 # are installed via CRAN and Bioconductor.
 c...
Automated

 def install_biolinux():
     ec2_ubuntu_environment()
     pkg_install, lib_install = _read_main_config()
    ...
Ready to use biological data

 % ls /referenceGenomes/            % ls Hsapiens/hg18
 Athaliana                          a...
Organization: Codefest 2010




 www.open-bio.org/wiki/Codefest_2010
Upcoming SlideShare
Loading in...5
×

Developing an open source community for cloud bioinformatics

2,062

Published on

Talk for Amazon workshop:

http://aws.amazon.com/genomics_workshop/

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,062
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Developing an open source community for cloud bioinformatics"

  1. 1. Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010
  2. 2. Overview 1 Building open source bioinformatics communities is hard. 2 Developer resources are a productive target. 3 Framework: collaborative software images and data snapshots.
  3. 3. Motivation Open source OpenBio, Biopython Graduate school – developed distributed algorithm. Never reused. Work Startup: Automated biological pipelines. Research hospital: Democratization of analysis.
  4. 4. Filters in biological computing Working in same biological area Interest in developing open source code Technical abilities Your software is good enough
  5. 5. Successful bioinformatics Sean Eddy, HMMER ...the best software in the field is often an unplanned labor of love from a single investigator. http://selab.janelia.org/people/eddys/blog/?p=313
  6. 6. Recognizing contributions
  7. 7. Successful community projects OpenBio: BioPerl, Biopython, BioJava Bioconductor Common theme Aimed at developers. Biologists benefit indirectly.
  8. 8. Lowering activation energy
  9. 9. Establishing common platform The solution = to all our problems Remove install and distribution barriers Building block for scaling
  10. 10. Existing cloud bioinformatics work JCVI Cloud BioLinux bioperl-max MachetEC2 Debian Med Overlapping set of useful functionality.
  11. 11. Integrated community solution Inclusive but configurable Easy to contribute Automated Bootstrap bare machine to fully ready distributed AMI. http://github.com/chapmanb/bcbb/tree/master/ec2/ biolinux/
  12. 12. Inclusive but configurable # Top level YAML configuration file specifying # groups of programs to be installed. packages: - python - r - erlang - databases - viz - bio_search - bio_alignment - bio_nextgen - bio_sequencing - bio_visualization - phylogeny libraries: - r-libs - python-libs
  13. 13. Easy to contribute # Configuration file defining R specific libraries that # are installed via CRAN and Bioconductor. cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/ cran: - ggplot2 - rjson - sqldf - NMF - ape biocrepo: http://bioconductor.org/biocLite.R bioc: - ShortRead - BSgenome - edgeR - GOstats - biomaRt - Rsamtools
  14. 14. Automated def install_biolinux(): ec2_ubuntu_environment() pkg_install, lib_install = _read_main_config() _apt_packages(pkg_install) _do_library_installs(lib_install) def _ruby_library_installer(config): for gem in config[’gems’]: sudo("gem install %s" % gem) Fabric: http://docs.fabfile.org/
  15. 15. Ready to use biological data % ls /referenceGenomes/ % ls Hsapiens/hg18 Athaliana arachne Celegans bowtie Dmelanogaster bwa Ecoli eland Hsapiens maq Mmusculus seq Msmegmatis snps Mtuberculosis_H37Rv ucsc Paeruginosa_UCBPP-PA14 phiX174 Rnorvegicus Scerevisiae Xtropicalis http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
  16. 16. Organization: Codefest 2010 www.open-bio.org/wiki/Codefest_2010
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×