Your SlideShare is downloading. ×
Developing an open source community for cloud bioinformatics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Developing an open source community for cloud bioinformatics

2,034
views

Published on

Talk for Amazon workshop: …

Talk for Amazon workshop:

http://aws.amazon.com/genomics_workshop/

Published in: Technology, News & Politics

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,034
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010
  • 2. Overview 1 Building open source bioinformatics communities is hard. 2 Developer resources are a productive target. 3 Framework: collaborative software images and data snapshots.
  • 3. Motivation Open source OpenBio, Biopython Graduate school – developed distributed algorithm. Never reused. Work Startup: Automated biological pipelines. Research hospital: Democratization of analysis.
  • 4. Filters in biological computing Working in same biological area Interest in developing open source code Technical abilities Your software is good enough
  • 5. Successful bioinformatics Sean Eddy, HMMER ...the best software in the field is often an unplanned labor of love from a single investigator. http://selab.janelia.org/people/eddys/blog/?p=313
  • 6. Recognizing contributions
  • 7. Successful community projects OpenBio: BioPerl, Biopython, BioJava Bioconductor Common theme Aimed at developers. Biologists benefit indirectly.
  • 8. Lowering activation energy
  • 9. Establishing common platform The solution = to all our problems Remove install and distribution barriers Building block for scaling
  • 10. Existing cloud bioinformatics work JCVI Cloud BioLinux bioperl-max MachetEC2 Debian Med Overlapping set of useful functionality.
  • 11. Integrated community solution Inclusive but configurable Easy to contribute Automated Bootstrap bare machine to fully ready distributed AMI. http://github.com/chapmanb/bcbb/tree/master/ec2/ biolinux/
  • 12. Inclusive but configurable # Top level YAML configuration file specifying # groups of programs to be installed. packages: - python - r - erlang - databases - viz - bio_search - bio_alignment - bio_nextgen - bio_sequencing - bio_visualization - phylogeny libraries: - r-libs - python-libs
  • 13. Easy to contribute # Configuration file defining R specific libraries that # are installed via CRAN and Bioconductor. cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/ cran: - ggplot2 - rjson - sqldf - NMF - ape biocrepo: http://bioconductor.org/biocLite.R bioc: - ShortRead - BSgenome - edgeR - GOstats - biomaRt - Rsamtools
  • 14. Automated def install_biolinux(): ec2_ubuntu_environment() pkg_install, lib_install = _read_main_config() _apt_packages(pkg_install) _do_library_installs(lib_install) def _ruby_library_installer(config): for gem in config[’gems’]: sudo("gem install %s" % gem) Fabric: http://docs.fabfile.org/
  • 15. Ready to use biological data % ls /referenceGenomes/ % ls Hsapiens/hg18 Athaliana arachne Celegans bowtie Dmelanogaster bwa Ecoli eland Hsapiens maq Mmusculus seq Msmegmatis snps Mtuberculosis_H37Rv ucsc Paeruginosa_UCBPP-PA14 phiX174 Rnorvegicus Scerevisiae Xtropicalis http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py
  • 16. Organization: Codefest 2010 www.open-bio.org/wiki/Codefest_2010

×