BioRuby
                                 Project Update
Raoul J.P. Bonnal                                                 ...
Introduction


BioRuby - bioinformatics library for Ruby language
• Object oriented scripting language, functional and ref...
BioRuby & Platforms



                    Ruby Interpreter
     Performances                       Portability
Ruby      ...
BioRuby & Platforms
BioLib




                             Ruby Interpreter
              Performances                   ...
BioRuby & Platforms                              Cytoscape




                    Ruby Interpreter
     Performances     ...
History
    2008                   2009                                            2010

           WebServices           ...
BioRuby Features

Category          Modules
Object Sequence   pathway, tree, bibliography reference
Sequence          tran...
Relevant New                 Features1


Bio::SQL Interoperable storage of sequences -Raoul Bonnal-
  require ‘bio’
  #act...
Relevant New                     Features2


Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek-
  require ‘bio’ ...
Relevant New                     Features3


Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto-
  require ‘bio...
Relevant New               Features4



Bio::NCBI::REST example
  require ‘bio’
  ncbi = Bio::NCBI::REST::ESearch.new
  nc...
BioRuby is Agile
●   OpenBio* developers are the Stakeholders
    ●    Speed up in the iteration proccess
    ●    Frequen...
Moving to Agile Programming
2500



2000



1500

                                                                 Tests
1...
Refactoring
3500


3000


2500


2000                                                           Files
                    ...
Ongoing Work
●   Semantic Web (started @ BioHackathon 2010)
    ●   Expose data in RDF
    ●   Consuming SPARQL end points...
PlugIn system
●   We want a BioRuby core stable on every OS
    ●   But… we want to use experimental code ASAP
    ●   Bio...
PlugIn system
The plugin system will be delivered with the next
  BioRuby release
BioGraphics – Jan Aerts-
For biologists:...
What We Need



●   Better integration with R
●   Better support for data visualization (interpretation)
●   Detailed Road...
Publications
BioRuby: Bioinformatics software for the Ruby programming language (submitted)
    Naohisa Goto, Pjotr Prins,...
Acknoledgments
●   BioRuby Team
                                       Open Bioinformatics Foundation
    ●   Toshiaki Kat...
Upcoming SlideShare
Loading in...5
×

Bonnal bosc2010 bio_ruby

1,192

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,192
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Bonnal bosc2010 bio_ruby

  1. 1. BioRuby Project Update Raoul J.P. Bonnal co-authors: Toshiaki Katayama r@bioruby.org Pjotr Prins Life Science Informatics Mitsuteru Nakao Integrative Biology Program Fondazione INGM Christian M Zmasek Italy Nahoisa Goto 11th Annual Bioinformatic Open Source Conference (BOSC) 2010 Boston, Massachusetts, USA
  2. 2. Introduction BioRuby - bioinformatics library for Ruby language • Object oriented scripting language, functional and reflective • has become popular by "Ruby on Rails“ • created by Matz in 1993 in Japan
  3. 3. BioRuby & Platforms Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  4. 4. BioRuby & Platforms BioLib Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  5. 5. BioRuby & Platforms Cytoscape Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  6. 6. History 2008 2009 2010 WebServices Workflows SemanticWeb Code fest 1.3.0 1.4.0 1.3.1 BOSC --- GSoC GSoC +++ git •phyloXML •Ruby 1.9.2 •NeXML I/O, RDF triples •Infer gene duplications GitHub: GSoC references: http://github.com/bioruby/bioruby Ruby 1.9.2 support of BioRuby (OBF) Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent) Implementation of algorithm to infer gene duplications in BioRuby (OBF) Implementing phyloXML support in BioRuby (NESCent)
  7. 7. BioRuby Features Category Modules Object Sequence pathway, tree, bibliography reference Sequence translation, alignment, location,mapping, feature table, molecular Manipulation weight, design siRNA, restriction enzyme Format GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF, MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology Tool BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4, Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons Phylogeny PHYLIP, PAML, phyloXML, NEXUS, Newick Web Service NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM ODBA BioSQL, BioFetch, indexed flat files Shell Interactive environment for rapid Bioinformatics analyses
  8. 8. Relevant New Features1 Bio::SQL Interoperable storage of sequences -Raoul Bonnal- require ‘bio’ #active_record (ORM) #your_database_adapter (MYSQL, Postgresql,JDBC) connection = Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name, ‘database’=> ‘CoolBioSeqDB’, ‘adapter’=> ‘jdbcmysql’ ‘username’=> ‘Raoul’, ‘password’=> ‘SmartPassword’}, ‘development’) #read a GenBank file and store: my_sotrage = Bio::SQL::Biodatabase.find(:first) genbank = Bio::GenBank.open(‘dbvrl1.gb’) genbank.each_entry do |gb| Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence, :biodatabase=>my_sotrage) end #fetch an accession is easy Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)
  9. 9. Relevant New Features2 Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek- require ‘bio’ # libxml-ruby #Create a parser phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’) #Consume the tree phyloxml.each do |tree| puts tree.name end #Wrinting writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’) write.writer(tree2) #Extract information phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’) phyloxml.each do |tree| tree.each_nome do |node| print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’ end end Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10, 356.
  10. 10. Relevant New Features3 Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto- require ‘bio’ ff_fasta = Bio::FlatFile.open(filename.fasta) ff_qual = Bio::FlatFile.open(filename.qual) while entry_fasta = ff_fasta.next_entry seq = entry_fasta.to_biosequence seq.quality_score_type = :phred seq.quality_scores = ff_qual.next_entry.data puts seq.output(:fastq, :title => entry_fasta.definition) end ● Format supported: SOLEXA, ILLUMINA Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
  11. 11. Relevant New Features4 Bio::NCBI::REST example require ‘bio’ ncbi = Bio::NCBI::REST::ESearch.new ncbi.search("nucleotide", "tardigrada") ncbi.count("nucleotide", "tardigrada") ncbi.nucleotide("tardigrada") ncbi.taxonomy("tardigrada") ncbi.pubmed("tardigrada", "reldate" => 365) ncbi.pubmed("mammoth mitochondrial genome") Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG require ‘bio’ t = Bio::TogoWS::REST.new puts t.entry('genbank', 'AF237819') puts t.search('uniprot', 'lung cancer')
  12. 12. BioRuby is Agile ● OpenBio* developers are the Stakeholders ● Speed up in the iteration proccess ● Frequent meetings (mail, skype/voice chat, irc) ● Test Everything (required for new features) – Improve quality , maintainability and guarantee portability – Ruby Unit Testing Framework , Rspec ● GitHub ● Low barries for new developers ● 32 forks and 100 people watching us Agile Manifesto
  13. 13. Moving to Agile Programming 2500 2000 1500 Tests 1000 Tutorial's lines 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  14. 14. Refactoring 3500 3000 2500 2000 Files Classes 1500 Modules Methods 1000 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  15. 15. Ongoing Work ● Semantic Web (started @ BioHackathon 2010) ● Expose data in RDF ● Consuming SPARQL end points efficiently ● Ruby 1.9.2 support of BioRuby ( GSoC & OBF) ● Improved performances ● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC & NESCent) ● Implementation of algorithm to infer gene duplications in BioRuby (GSoC & OBF)
  16. 16. PlugIn system ● We want a BioRuby core stable on every OS ● But… we want to use experimental code ASAP ● BioRuby + BioRuby Plugin + Rails we can have multiple applications with an unique core and specific features – User or Application ● Suggest Guidelines for plugin namespace ● On GitHub you can find our plugins looking for bioruby-plugin-NAME
  17. 17. PlugIn system The plugin system will be delivered with the next BioRuby release BioGraphics – Jan Aerts- For biologists: bioruby --plugin install graphics For geeks: bioruby --plugin install git://github.com/user/repo.git It’s very experimental
  18. 18. What We Need ● Better integration with R ● Better support for data visualization (interpretation) ● Detailed Roadmap
  19. 19. Publications BioRuby: Bioinformatics software for the Ruby programming language (submitted) Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows (accepted) Toshiaki Katayama et all. Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010) TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010) Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771. Over 24 articles use BioRuby as in their analyses, check the up to date list: http://bioruby.open-bio.org/wiki/Research_using_BioRuby
  20. 20. Acknoledgments ● BioRuby Team Open Bioinformatics Foundation ● Toshiaki Katayama* ● Naoshita Goto* ● Pjotr Prins* Database Center for Life Science ● Mitsuteru Nakao* ● Jan Aerts* ● Christian M Zmasek* Google Summer of Code ● All GSoC students NESCent National Evolutionary Synthesis Center * co-author
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×