  1. 1. n o Kazuharu Arakawa 0 i Institute for Advanced Biosciences, Keio University Graduate School of Media and Governance Expertise: Bioinformatics, Systems Biology t g. c u d o r p
  2. 2. G-language Web Service Interface Institute for Advanced Biosciences, Keio University KAZUHARU ARAKAWA NOBUHIRO KIDO KAZUKI OSHITA MASARU TOMITA 2009.06.27
  3. 3. G-language Project • First release in 2001 (Now 1.8.8) • Perl library, interactive shell, 100+ applications, GUI • Focus on analysis of bacterial genomes. • compatible with BioPerl (10~50x faster for manipulating genome flatfile) Arakawa et al. (2003) Bioinformatics Arakawa et al. (2006) Journal of Pesticide Science • Arakawa et al. (2008) Genes Genomes Genomics Arakawa et al. (2009) BMC Bioinformatics
  5. 5. Perl API: BioPerl vs G use Bio::SeqIO; $in = Bio::SeqIO->new(-file=>"ecoli.gbk", '-format'=>'GenBank'); $seq = $in->next_seq(); foreach $feat ($seq->all_SeqFeatures()){ next unless($feat->primary_tag eq ‘CDS’); print $feat->each_tag_value(“note”), “¥n”; } use Bio::DB::GenBank; use Bio::Seq; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc(“NC_000913”); use G; $gb = load ecoli; # $gb = load(“genbank:NC_000913”); foreach $cds ($gb->cds()){ say $gb->{$cds}->{note}; }
  6. 6. Interactive Shell • fully functional Perl shell • basic UNIX commands • mix of the above (weird) • print togoWS(‘NC_000908’) |head -n 10 |wc > out.txt • tab completion (file, functions), history, editing with EMACS key binding • persistent data • logging • search for functions (like wossname in EMBOSS) and reading documentations (like tfm in EMBOSS), both for G-language API and BioPerl classes • database search (NCBI, KEGG, UniProt ... and more) • sequence and data retrieval
  7. 7. Web Service Interface - Overview Deveopment supported by BioHackathon 2009 in Okinawa, Japan
  8. 8. REST Interface 1. Accessing genome flatfile data[species]/[gene]/[feature] a. - Nucleotide composition of E.coli genome b. - Feature information about recA gene c. - Start position of recA gene d.*/translation - Amino acid sequence of all genes (FASTA) 2. Manipulating genome data[species]/[gene]/[method]/[option=value]/... a. - List all available methods b.*/before_startcodon - Retrieve upstream sequence of all genes 3. Genome sequence analysis[species]/[method]/[option=value]/... a. - List all available methods b. - GC skew of M.genitalium with c. 1000bpwindows - Get the raw GC skew result as CSV data 4. Other methods (not requiring genome sequence input)[method]/[option=value]/... a. - Retrieve KEGG C00001 through togoWS b. - Show manual for gcskew method
  9. 9. AJAX/CGI Interface
  10. 10. Bio::Glite - light weight version using the REST service 32kb in size, only requires LWP::UserAgent easy install via “cpan Bio::Glite”
  11. 11. SOAP Service
  12. 12. Works with Taverna2 9 example workflows are already available at
  13. 13. ISMB Posters Poster U16 Command-line-based integration of online bioinformatics resources Kazuki Oshita, Kazuharu Arakawa, Masaru Tomita Poster U22 G-language Genome Analysis Environment Version 2: Integrated workbench for computational genome sequence analysis Kazuharu Arakawa, Masaru Tomita Poster X023 Automatic layout tool for large-scale metabolic pathway models based on KEGG Atlas and SBML/SBGN Nobuhiro Kido, Nobuaki Kono, Kazuharu Arakawa, Masaru Tomita Poster X031 Pathway Projector: Web-based Zoomable Pathway Browser using KEGG Atlas and Google Maps API Nobuaki Kono, Kazuharu Arakawa, Nobuhiro Kido, Ryu Ogawa, Kazuki Oshita, Keita Ikegami, Satoshi Tamaki, Masaru Tomita
  14. 14. Nobuhiro the REST/CGI server deveoloped Kido Kazuki Oshitaserver deveoloped the SOAP Acknowledgements BioHackathon 2009 sponsored by DBCLS and OIST of Japan Yamagata Prefectural Government and Tsuruoka City.