BioRuby
is an
open-source
project
BUT, I HAVE A QUESTION...
Aspects of the word ‘OPEN’
•OPEN for
redistribution
•OPEN for source
code access
•OPEN for
contribution
CENTRALIZED APPROACH
• Pros
–QC for stability and consistency
–easy to apply coding standard
–enables extensive tests and documentation
• Cons
–heavy burden on release managers
–longer process, sparser release
–lack of cutting-edge features
Two ways to participate in
BioRuby development
1. Be a committer
1. be a trusted contributor in the community
2. get an open-bio.org account
3. be a CSV/SVN committer
2. Send patches to (busy) core-members
1. wait for patch evaluation
2. wait for next release of BioRuby
Two ways to participate in
BioRuby development
1. Be a committer
1. be a trusted contributor in the community
2. get an open-bio.org account
3. be a CSV/SVN committer
2. Send patches to (busy) core-members
1. wait for patch evaluation
2. wait for next release of BioRuby
Actions of BioRuby
•more OPEN for
source code
access
•more OPEN for
contribution
ACTION 1
Social Coding Using GitHub
In 2010, the BioRuby
project source repository
moved to GitHub
• Users can fork the code freely.
• Users still have to wait for
acceptance of pull-requests to get
their code incorporated into the
official repository.
DECENTRALIZED APPROACH
• Enables expanding BioRuby without
tweaking its stable core
• plug-ins are maintained by their authors
• encourage ‘best practice’ using a tool
(biogem command)
– Standard directory structure
– version control using Git
– Using the RubyGems packaging system
– testing and documentation
Biogems.info – a portal site for Biogem users
Biogems.info
rank in total downloads (rank up&down)
citation, current version,
day of final release, links to source code,
status of Travis continuous integration
highly motivating (me)
Database /web-service API File Parser Visualization
bio ucsc api bio gff3 bio graphics
intermine bio assembly Framework
eutils bio blastxmlparser bio ngs
sequenceserver bio faster Toolbox
goruby bio alignment bio genomic interval
bio ensembl bio nexml bio bigbio
Wrapper bio kb illumina bio hello
bio samtools bio octopus bio plasmoap
bio logger bio affy bio cnls screenscraper
bio bwa bio dbsno bio data
bio signalp bio rdf bio aliphatic index
bio sge bio hmmer model bio hydropathy
bio exportpred bio hmmer3 report bio gngm
bio tabix bio pileup iterator
Application bio phyloxml Biogem Example
scaffolder bio hello
genfrag
bio isoelectric point Biogem Collection
bio phyta bio core
bio tm hmm
dna sequence aligner
bio gag
bio kmer counter
more than 60 Biogems...
Database /web-service API File Parser Visualization
bio ucsc api bio gff3 bio graphics
intermine bio assembly Framework
eutils bio blastxmlparser bio ngs
sequenceserver bio faster Toolbox
goruby bio alignment bio genomic interval
bio ensembl bio nexml bio bigbio
Wrapper bio kb illumina bio hello
bio samtools bio octopus bio plasmoap
bio logger bio affy bio cnls screenscraper
bio bwa bio dbsnp bio data
bio signalp bio rdf bio aliphatic index
bio sge bio hmmer model bio hydropathy
bio exportpred bio hmmer3 report bio gngm
bio tabix bio pileup iterator
Application bio phyloxml Biogem Example
scaffolder bio hello
genfrag
bio isoelectric point Biogem Collection
bio phyta bio core
bio tm hmm
dna sequence aligner
bio gag Database Access-related
bio kmer counter Next Generation Sequencing-related
Hiro Mishima
• NOT a core
developer of
BioRuby
• not a computer
scientist but a
dentist
• semi-dry biologist
• human geneticist
A query written in fluent interface.
require 'bio-ucsc‘
Bio::Ucsc::Hg19.connect
result =
Bio::Ucsc::Hg19::Snp131.
find_by_name("rs56289060")
puts result.chrom # => "chr1"
23
SQL made easy
region = "chr17:7,579,614-7,579,700"
condition =
Bio::Ucsc::Hg19::Snp131.
with_interval(region).select(:name)
puts condition.to_sql
SELECT name FROM `snp131`
WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0)
AND ( (chromStart BETWEEN 7579613 AND 7579700)
OR (chromEnd BETWEEN 7579613 AND 7579700)
OR (chromStart <= 7579613 AND
chromEND >= 7579700) ));
24
FUTURE DIRECTION of BioGem
• Still QC by peer-review is important.
–ensures stability and quality of codes
and documents
–educates plug-in authors
• R/Bioconductor has excellent peer-
review system
–good coding style and well-formatted
document
–requires huge human resources and
efforts
Solutions would be…
• recommended collections
• Bio-Core (Raoul J.P. Bonnal)
• loose/casual peer-review
• need to draw up guidelines for
designing “good” biogems
ACKNOWLEDGMENTS
• All BioRuby contributors
• Ruby UCSC API
– Jan Aerts
• The BioRuby Panel
– Raoul Bonnal
– Naohisa Goto
– Francesco Strozzi
– Toshiaki Katayama
– Pjotr Prins
• Dept. of Human Genetics, Nagasaki Univ.
– Koh-ichiro Yoshiura
• Google Summer of Code students
• O|B|F – Open Bioinformatics Foundation