Bonnal bosc2010 bio_ruby

BioRuby
Project Update
Raoul J.P. Bonnal co-authors:
Toshiaki Katayama
r@bioruby.org Pjotr Prins
Life Science Informatics Mitsuteru Nakao
Integrative Biology Program
Fondazione INGM
Christian M Zmasek
Italy Nahoisa Goto
11th Annual Bioinformatic Open Source Conference (BOSC) 2010

Boston, Massachusetts, USA

Introduction

BioRuby - bioinformatics library for Ruby language
• Object oriented scripting language, functional and reflective
• has become popular by "Ruby on Rails“
• created by Matz in 1993 in Japan

BioRuby & Platforms

Ruby Interpreter
Performances Portability
Ruby JRuby
RubyEE Java libraries

gem install bio
Operating Systems

BioRuby & Platforms
BioLib

Ruby Interpreter
Ruby JRuby

gem install bio
Operating Systems

BioRuby & Platforms Cytoscape

Ruby Interpreter
Ruby JRuby

gem install bio
Operating Systems

History
2008 2009 2010

WebServices Workflows SemanticWeb
Code fest
1.3.0 1.4.0
1.3.1 BOSC

--- GSoC GSoC
+++ git
•phyloXML •Ruby 1.9.2
•NeXML I/O, RDF triples
•Infer gene duplications

GitHub: GSoC references:
http://github.com/bioruby/bioruby Ruby 1.9.2 support of BioRuby (OBF)
Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent)
Implementation of algorithm to infer gene duplications in BioRuby (OBF)
Implementing phyloXML support in BioRuby (NESCent)

BioRuby Features

Category Modules
Object Sequence pathway, tree, bibliography reference
Sequence translation, alignment, location,mapping, feature table, molecular
Manipulation weight, design siRNA, restriction enzyme

Format GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF,
MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology

Tool BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4,
Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons
Phylogeny PHYLIP, PAML, phyloXML, NEXUS, Newick

Web Service NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM

ODBA BioSQL, BioFetch, indexed flat files

Shell Interactive environment for rapid Bioinformatics analyses

Relevant New Features1

Bio::SQL Interoperable storage of sequences -Raoul Bonnal-
require ‘bio’
#active_record (ORM)
#your_database_adapter (MYSQL, Postgresql,JDBC)
connection =
Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name,
‘database’=> ‘CoolBioSeqDB’,
‘adapter’=> ‘jdbcmysql’
‘username’=> ‘Raoul’,
‘password’=> ‘SmartPassword’},
‘development’)
#read a GenBank file and store:
my_sotrage = Bio::SQL::Biodatabase.find(:first)
genbank = Bio::GenBank.open(‘dbvrl1.gb’)
genbank.each_entry do |gb|
Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence,
:biodatabase=>my_sotrage)
end

#fetch an accession is easy
Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)


Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek-
require ‘bio’ # libxml-ruby

#Create a parser
phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’)

#Consume the tree
phyloxml.each do |tree|
puts tree.name
end
#Wrinting
writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’)
write.writer(tree2)

#Extract information
phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’)
phyloxml.each do |tree|
tree.each_nome do |node|
print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’
end
end Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for
evolutionary biology and
comparative genomics. BMC Bioinformatics, 10, 356.


Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto-
require ‘bio’
ff_fasta = Bio::FlatFile.open(filename.fasta)
ff_qual = Bio::FlatFile.open(filename.qual)

while entry_fasta = ff_fasta.next_entry
seq = entry_fasta.to_biosequence
seq.quality_score_type = :phred
seq.quality_scores = ff_qual.next_entry.data
puts seq.output(:fastq,
:title => entry_fasta.definition)
end

● Format supported: SOLEXA, ILLUMINA

Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P.
M. (2010). The Sanger
FASTQ file format for sequences with quality scores, and
the Solexa/Illumina
FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.


Bio::NCBI::REST example
require ‘bio’
ncbi = Bio::NCBI::REST::ESearch.new
ncbi.search("nucleotide", "tardigrada")
ncbi.count("nucleotide", "tardigrada")
ncbi.nucleotide("tardigrada")
ncbi.taxonomy("tardigrada")
ncbi.pubmed("tardigrada", "reldate" => 365)
ncbi.pubmed("mammoth mitochondrial genome")

Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG
require ‘bio’
t = Bio::TogoWS::REST.new
puts t.entry('genbank', 'AF237819')
puts t.search('uniprot', 'lung cancer')

BioRuby is Agile
● OpenBio* developers are the Stakeholders
● Speed up in the iteration proccess
● Frequent meetings (mail, skype/voice chat, irc)
● Test Everything (required for new features)
– Improve quality , maintainability and guarantee portability
– Ruby Unit Testing Framework , Rspec
● GitHub
● Low barries for new developers
● 32 forks and 100 people watching us

Agile Manifesto

Moving to Agile Programming
2500

2000

1500

Tests
1000 Tutorial's lines

500

0
1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0

Refactoring
3500

3000

2500

2000 Files
Classes
1500 Modules
Methods
1000

500

0
1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0

Ongoing Work
● Semantic Web (started @ BioHackathon 2010)
● Expose data in RDF
● Consuming SPARQL end points efficiently
● Ruby 1.9.2 support of BioRuby ( GSoC & OBF)
● Improved performances
● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC &
NESCent)
● Implementation of algorithm to infer gene duplications in BioRuby
(GSoC & OBF)

PlugIn system
● We want a BioRuby core stable on every OS
● But… we want to use experimental code ASAP
● BioRuby + BioRuby Plugin + Rails we can have multiple
applications with an unique core and specific features
– User or Application
● Suggest Guidelines for plugin namespace
● On GitHub you can find our plugins looking for
bioruby-plugin-NAME

PlugIn system
The plugin system will be delivered with the next
BioRuby release
BioGraphics – Jan Aerts-
For biologists:
bioruby --plugin install graphics

For geeks:
bioruby --plugin install git://github.com/user/repo.git

It’s very experimental

What We Need

● Better integration with R
● Better support for data visualization (interpretation)
● Detailed Roadmap

Publications
BioRuby: Bioinformatics software for the Ruby programming language (submitted)
Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and
workflows (accepted)
Toshiaki Katayama et all.

Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010)
TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids
Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010)

Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010).
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Nucleic Acids Res, 38(6), 1767.1771.

Over 24 articles use BioRuby as in their analyses, check the up to date list:
http://bioruby.open-bio.org/wiki/Research_using_BioRuby

Acknoledgments
● BioRuby Team
Open Bioinformatics Foundation
● Toshiaki Katayama*
● Naoshita Goto*
● Pjotr Prins* Database Center for Life Science
● Mitsuteru Nakao*
● Jan Aerts*
● Christian M Zmasek*
Google Summer of Code
● All GSoC students

NESCent
National Evolutionary Synthesis Center

* co-author

Bonnal bosc2010 bio_ruby

More Related Content

Viewers also liked

Similar to Bonnal bosc2010 bio_ruby

More from BOSC 2010

Recently uploaded

Bonnal bosc2010 bio_ruby