Bonnal bosc2010 bio_ruby

  • 1,116 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,116
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BioRuby Project Update Raoul J.P. Bonnal co-authors: Toshiaki Katayama r@bioruby.org Pjotr Prins Life Science Informatics Mitsuteru Nakao Integrative Biology Program Fondazione INGM Christian M Zmasek Italy Nahoisa Goto 11th Annual Bioinformatic Open Source Conference (BOSC) 2010 Boston, Massachusetts, USA
  • 2. Introduction BioRuby - bioinformatics library for Ruby language • Object oriented scripting language, functional and reflective • has become popular by "Ruby on Rails“ • created by Matz in 1993 in Japan
  • 3. BioRuby & Platforms Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 4. BioRuby & Platforms BioLib Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 5. BioRuby & Platforms Cytoscape Ruby Interpreter Performances Portability Ruby JRuby RubyEE Java libraries gem install bio Operating Systems
  • 6. History 2008 2009 2010 WebServices Workflows SemanticWeb Code fest 1.3.0 1.4.0 1.3.1 BOSC --- GSoC GSoC +++ git •phyloXML •Ruby 1.9.2 •NeXML I/O, RDF triples •Infer gene duplications GitHub: GSoC references: http://github.com/bioruby/bioruby Ruby 1.9.2 support of BioRuby (OBF) Develop an API for NeXML I/O, and, RDF triples for BioRuby (NESCent) Implementation of algorithm to infer gene duplications in BioRuby (OBF) Implementing phyloXML support in BioRuby (NESCent)
  • 7. BioRuby Features Category Modules Object Sequence pathway, tree, bibliography reference Sequence translation, alignment, location,mapping, feature table, molecular Manipulation weight, design siRNA, restriction enzyme Format GenBank, EMBL, UniProt, KEGG, PDB, MEDLINE, REBASE, FASTQ, GFF, MSF, ABIF, SCF, GCG, Lasergene, GEO SOFT, Gene Ontology Tool BLAST, FASTA, EMBOSS, HMMER, InterProScan,GenScan, BLAT, Sim4, Spidey, MEME, ClustalW, MUSCLE, MAFFT, T-Coffee, ProbCons Phylogeny PHYLIP, PAML, phyloXML, NEXUS, Newick Web Service NCBI, EBI, DDBJ, KEGG, TogoWS, PSORT, TargetP, PTS1, SOSUI, TMHMM ODBA BioSQL, BioFetch, indexed flat files Shell Interactive environment for rapid Bioinformatics analyses
  • 8. Relevant New Features1 Bio::SQL Interoperable storage of sequences -Raoul Bonnal- require ‘bio’ #active_record (ORM) #your_database_adapter (MYSQL, Postgresql,JDBC) connection = Bio::SQL.establish_connection({‘development=>{‘hostname=>you_host_name, ‘database’=> ‘CoolBioSeqDB’, ‘adapter’=> ‘jdbcmysql’ ‘username’=> ‘Raoul’, ‘password’=> ‘SmartPassword’}, ‘development’) #read a GenBank file and store: my_sotrage = Bio::SQL::Biodatabase.find(:first) genbank = Bio::GenBank.open(‘dbvrl1.gb’) genbank.each_entry do |gb| Bio::SQL::Sequence.new(:biosequence=>gb.to_biosequence, :biodatabase=>my_sotrage) end #fetch an accession is easy Bio::SQL.fetch_accession(your_accession).to_biosequence.output(:embl)
  • 9. Relevant New Features2 Bio::PhyloXML r/w by -Diana Jaunzeikare, Christian M Zmasek- require ‘bio’ # libxml-ruby #Create a parser phyloxml = Bio::PhyloXML::Parser.new(‘example.xml’) #Consume the tree phyloxml.each do |tree| puts tree.name end #Wrinting writer = Bio::PhyloXML::Writer.new(‘my_tree.xml’) write.writer(tree2) #Extract information phyloxml = Bio::PhyloXML::Parser.new(‘ncbi_taxnonomy_mollusca.xml’) phyloxml.each do |tree| tree.each_nome do |node| print ‘Scientific name: ‘, node.taxonomies[0].scientific_name,‘n’ end end Han, M. V. and Zmasek, C. M. (2009). phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics, 10, 356.
  • 10. Relevant New Features3 Bio::FASTQ r/w Next Generation Sequencing FASTQ -Naohisa Goto- require ‘bio’ ff_fasta = Bio::FlatFile.open(filename.fasta) ff_qual = Bio::FlatFile.open(filename.qual) while entry_fasta = ff_fasta.next_entry seq = entry_fasta.to_biosequence seq.quality_score_type = :phred seq.quality_scores = ff_qual.next_entry.data puts seq.output(:fastq, :title => entry_fasta.definition) end ● Format supported: SOLEXA, ILLUMINA Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771.
  • 11. Relevant New Features4 Bio::NCBI::REST example require ‘bio’ ncbi = Bio::NCBI::REST::ESearch.new ncbi.search("nucleotide", "tardigrada") ncbi.count("nucleotide", "tardigrada") ncbi.nucleotide("tardigrada") ncbi.taxonomy("tardigrada") ncbi.pubmed("tardigrada", "reldate" => 365) ncbi.pubmed("mammoth mitochondrial genome") Bio::TogoWS entry point for PDBj, NCBI, DDBJ, EBI, KEGG require ‘bio’ t = Bio::TogoWS::REST.new puts t.entry('genbank', 'AF237819') puts t.search('uniprot', 'lung cancer')
  • 12. BioRuby is Agile ● OpenBio* developers are the Stakeholders ● Speed up in the iteration proccess ● Frequent meetings (mail, skype/voice chat, irc) ● Test Everything (required for new features) – Improve quality , maintainability and guarantee portability – Ruby Unit Testing Framework , Rspec ● GitHub ● Low barries for new developers ● 32 forks and 100 people watching us Agile Manifesto
  • 13. Moving to Agile Programming 2500 2000 1500 Tests 1000 Tutorial's lines 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  • 14. Refactoring 3500 3000 2500 2000 Files Classes 1500 Modules Methods 1000 500 0 1.0.0 1.1.0 1.2.0 1.2.1 1.3.0 1.3.1 1.4.0
  • 15. Ongoing Work ● Semantic Web (started @ BioHackathon 2010) ● Expose data in RDF ● Consuming SPARQL end points efficiently ● Ruby 1.9.2 support of BioRuby ( GSoC & OBF) ● Improved performances ● Develop an API for NeXML I/O, and, RDF triples for BioRuby (GSoC & NESCent) ● Implementation of algorithm to infer gene duplications in BioRuby (GSoC & OBF)
  • 16. PlugIn system ● We want a BioRuby core stable on every OS ● But… we want to use experimental code ASAP ● BioRuby + BioRuby Plugin + Rails we can have multiple applications with an unique core and specific features – User or Application ● Suggest Guidelines for plugin namespace ● On GitHub you can find our plugins looking for bioruby-plugin-NAME
  • 17. PlugIn system The plugin system will be delivered with the next BioRuby release BioGraphics – Jan Aerts- For biologists: bioruby --plugin install graphics For geeks: bioruby --plugin install git://github.com/user/repo.git It’s very experimental
  • 18. What We Need ● Better integration with R ● Better support for data visualization (interpretation) ● Detailed Roadmap
  • 19. Publications BioRuby: Bioinformatics software for the Ruby programming language (submitted) Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows (accepted) Toshiaki Katayama et all. Toshiaki Katayama, Mitsuteru Nakao and Toshihisa Takagi (2010) TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services, Nucleic Acids Research, 2010, Vol. 38, No. suppl_2 W706-W711, doi:10.1093/nar/gkq386 (Web Server Issue 2010) Cock, P. J., Fields, C. J., Goto, N., Heuer, M. L., and Rice, P. M. (2010). The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res, 38(6), 1767.1771. Over 24 articles use BioRuby as in their analyses, check the up to date list: http://bioruby.open-bio.org/wiki/Research_using_BioRuby
  • 20. Acknoledgments ● BioRuby Team Open Bioinformatics Foundation ● Toshiaki Katayama* ● Naoshita Goto* ● Pjotr Prins* Database Center for Life Science ● Mitsuteru Nakao* ● Jan Aerts* ● Christian M Zmasek* Google Summer of Code ● All GSoC students NESCent National Evolutionary Synthesis Center * co-author