Arakawa_Glanguage_BOSC2009

782 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
782
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Arakawa_Glanguage_BOSC2009

  1. 1. n o Kazuharu Arakawa 0 i Institute for Advanced Biosciences, Keio University Graduate School of Media and Governance Expertise: Bioinformatics, Systems Biology t g. c u d o r p
  2. 2. G-language Web Service Interface Institute for Advanced Biosciences, Keio University KAZUHARU ARAKAWA NOBUHIRO KIDO KAZUKI OSHITA MASARU TOMITA 2009.06.27
  3. 3. G-language Project • First release in 2001 (Now 1.8.8) • Perl library, interactive shell, 100+ applications, GUI • Focus on analysis of bacterial genomes. • compatible with BioPerl (10~50x faster for manipulating genome flatfile) Arakawa et al. (2003) Bioinformatics Arakawa et al. (2006) Journal of Pesticide Science • http://www.g-language.org/ Arakawa et al. (2008) Genes Genomes Genomics Arakawa et al. (2009) BMC Bioinformatics
  4. 4. BAS_engine _eri_reader blastcutting graphical_LTR_search redundancy_fasta BAS_parser _eri_update_with_kegg blastparser icdi redundancy_sim4 BAS_scripter _fasta bui leading_strand rep_ori_ter CHI_engine _file_list_for_mapping cai load_kegg_api rmpolya CHI_parser _find_bad_substance calc_pI load_kegg_api3 rscu CHI_scripter _find_pathway_gap cap3_parse load_rcluster run_glimmerM COMGA_correlation _foreach_blastpointer_for_mapping cbi longest_ORF sdb_load COMGA_engine _foreach_mask_repeat_for_mapping cds_echo ma_filter sdb_save COMGA_parser _formatdb cei ma_normalize seq2png COMGA_scripter _formatdb_for_mapping cluster ma_rfilter seqinfo COMGA_table_maker _gblaster codon_compiler mapping_blast2 set_cogpath DONT_USE_ERRO _h2v codon_counter mapping_sim4 set_goa GEMS_engine _hmmpfam codon_usage markov set_gpac GEMS_parser _jstat_for_STeP 100+ cognitor maskseq set_operon GEMS_scripter _jstat_for_mapping complement match_test shannon_cu KeySearch _key_printer consensus_z molecular_weight sim4_parse PubMedSearch _list_clusterer cum_gcskew msg_ask_interface splitprintseq RNAfold _list_sorter diffseq msg_error ss2er STeP_engine _makegaplist dignitor msg_gimv stderr STeP_parser _mask_repeat_for_mapping ecell msg_interface stdin STeP_scripter _oligomer_translation eliminate_atg msg_percent stdout _R_RNA_graph _over_lapping_printer eliminate_pat msg_progress substance_layout _R_base_graph _post_blast_clusterer enc msg_send substance_layout2 _STS_divider_for_STeP _print_tandem enzyme_layout msg_set_gimv test_gpac _STS_modifer_for_STeP _repeatmasker equitability msg_system_console translate _UniMultiGrapher _sdb_path er2eri msg_term_console usage_dist _UniUniGrapher _set_sdb_path fasta_parse oligomer_counter valid_CDS _acc2ftp_bacteria _sim4 file_maker opt_as_gb view_cds _base_printer _sts2pg_for_STeP file_maker_fasta opt_default w_value _blast _translate find_dnaAbox opt_get _blast_db_for_mapping _trf find_identical_gene opt_val _blast_for_mapping _value_printer find_king_of_gene ori_search _blast_tp_finder aa_codon_compiler find_ori_ter output_maker _blastpointer_for_mapping aa_codon_usage find_seq over_lapping_finder _cap3 aaui find_tandem palindrome _clustalw alignment fop pasteseq _codon_amino_printer amino_counter foreach_RNAfold peptide_mass _codon_table amino_info foreach_tandem phx _codon_usage_printer annotate_with_glimmerM form_sim4 plasmid_map _codon_usage_table atcgcon funcD print_gene_function_list _complement base_counter gcskew pseudo_atg _csv_h2v base_entropy gcwin qstat _cutquery_for_mapping base_individual_information_matrix genome_map qsub _distance_cu base_information_content genome_map2 query_strand _ePCR_for_STeP base_relative_entropy genomicskew read_goa _ecell_name2kegg_compound base_z_value gopac redundancy _eri_extracter blast_parse gpac redundancy_cap3
  5. 5. Perl API: BioPerl vs G use Bio::SeqIO; $in = Bio::SeqIO->new(-file=>"ecoli.gbk", '-format'=>'GenBank'); $seq = $in->next_seq(); foreach $feat ($seq->all_SeqFeatures()){ next unless($feat->primary_tag eq ‘CDS’); print $feat->each_tag_value(“note”), “¥n”; } use Bio::DB::GenBank; use Bio::Seq; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc(“NC_000913”); use G; $gb = load ecoli; # $gb = load(“genbank:NC_000913”); foreach $cds ($gb->cds()){ say $gb->{$cds}->{note}; }
  6. 6. Interactive Shell • fully functional Perl shell • basic UNIX commands • mix of the above (weird) • print togoWS(‘NC_000908’) |head -n 10 |wc > out.txt • tab completion (file, functions), history, editing with EMACS key binding • persistent data • logging • search for functions (like wossname in EMBOSS) and reading documentations (like tfm in EMBOSS), both for G-language API and BioPerl classes • database search (NCBI, KEGG, UniProt ... and more) • sequence and data retrieval
  7. 7. Web Service Interface - Overview Deveopment supported by BioHackathon 2009 in Okinawa, Japan
  8. 8. REST Interface http://rest.g-language.org http://useG.jp 1. Accessing genome flatfile data http://useG.jp/[species]/[gene]/[feature] a. http://useG.jp/ecoli/ - Nucleotide composition of E.coli genome b. http://useG.jp/ecoli/recA - Feature information about recA gene c. http://useG.jp/ecoli/recA/start - Start position of recA gene d. http://useG.jp/ecoli/*/translation - Amino acid sequence of all genes (FASTA) 2. Manipulating genome data http://useG.jp/[species]/[gene]/[method]/[option=value]/... a. http://useG.jp/method_list/gb - List all available methods b. http://useG.jp/NC_000913/*/before_startcodon - Retrieve upstream sequence of all genes 3. Genome sequence analysis http://useG.jp/[species]/[method]/[option=value]/... a. http://useG.jp/method_list/ - List all available methods b. http://useG.jp/mgen/gcskew/window=1000/ - GC skew of M.genitalium with c. http://useG.jp/mgen/gcskew/cumulative=1/output=f/ 1000bpwindows - Get the raw GC skew result as CSV data 4. Other methods (not requiring genome sequence input) http://useG.jp/[method]/[option=value]/... a. http://useG.jp/togoWS/C00001 - Retrieve KEGG C00001 through togoWS b. http://useG.jp/help/gcskew - Show manual for gcskew method
  9. 9. AJAX/CGI Interface http://ws.g-language.org/atelier/
  10. 10. Bio::Glite - light weight version using the REST service 32kb in size, only requires LWP::UserAgent easy install via “cpan Bio::Glite”
  11. 11. SOAP Service http://soap.g-language.org/g-language.wsdl
  12. 12. Works with Taverna2 9 example workflows are already available at
  13. 13. ISMB Posters Poster U16 Command-line-based integration of online bioinformatics resources Kazuki Oshita, Kazuharu Arakawa, Masaru Tomita Poster U22 G-language Genome Analysis Environment Version 2: Integrated workbench for computational genome sequence analysis Kazuharu Arakawa, Masaru Tomita Poster X023 Automatic layout tool for large-scale metabolic pathway models based on KEGG Atlas and SBML/SBGN Nobuhiro Kido, Nobuaki Kono, Kazuharu Arakawa, Masaru Tomita Poster X031 Pathway Projector: Web-based Zoomable Pathway Browser using KEGG Atlas and Google Maps API Nobuaki Kono, Kazuharu Arakawa, Nobuhiro Kido, Ryu Ogawa, Kazuki Oshita, Keita Ikegami, Satoshi Tamaki, Masaru Tomita
  14. 14. Nobuhiro the REST/CGI server deveoloped Kido Kazuki Oshitaserver deveoloped the SOAP Acknowledgements BioHackathon 2009 sponsored by DBCLS and OIST of Japan Yamagata Prefectural Government and Tsuruoka City.

×