Your SlideShare is downloading. ×
Arakawa_Glanguage_BOSC2009
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Arakawa_Glanguage_BOSC2009

556
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
556
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. n o Kazuharu Arakawa 0 i Institute for Advanced Biosciences, Keio University Graduate School of Media and Governance Expertise: Bioinformatics, Systems Biology t g. c u d o r p
  • 2. G-language Web Service Interface Institute for Advanced Biosciences, Keio University KAZUHARU ARAKAWA NOBUHIRO KIDO KAZUKI OSHITA MASARU TOMITA 2009.06.27
  • 3. G-language Project • First release in 2001 (Now 1.8.8) • Perl library, interactive shell, 100+ applications, GUI • Focus on analysis of bacterial genomes. • compatible with BioPerl (10~50x faster for manipulating genome flatfile) Arakawa et al. (2003) Bioinformatics Arakawa et al. (2006) Journal of Pesticide Science • http://www.g-language.org/ Arakawa et al. (2008) Genes Genomes Genomics Arakawa et al. (2009) BMC Bioinformatics
  • 4. BAS_engine _eri_reader blastcutting graphical_LTR_search redundancy_fasta BAS_parser _eri_update_with_kegg blastparser icdi redundancy_sim4 BAS_scripter _fasta bui leading_strand rep_ori_ter CHI_engine _file_list_for_mapping cai load_kegg_api rmpolya CHI_parser _find_bad_substance calc_pI load_kegg_api3 rscu CHI_scripter _find_pathway_gap cap3_parse load_rcluster run_glimmerM COMGA_correlation _foreach_blastpointer_for_mapping cbi longest_ORF sdb_load COMGA_engine _foreach_mask_repeat_for_mapping cds_echo ma_filter sdb_save COMGA_parser _formatdb cei ma_normalize seq2png COMGA_scripter _formatdb_for_mapping cluster ma_rfilter seqinfo COMGA_table_maker _gblaster codon_compiler mapping_blast2 set_cogpath DONT_USE_ERRO _h2v codon_counter mapping_sim4 set_goa GEMS_engine _hmmpfam codon_usage markov set_gpac GEMS_parser _jstat_for_STeP 100+ cognitor maskseq set_operon GEMS_scripter _jstat_for_mapping complement match_test shannon_cu KeySearch _key_printer consensus_z molecular_weight sim4_parse PubMedSearch _list_clusterer cum_gcskew msg_ask_interface splitprintseq RNAfold _list_sorter diffseq msg_error ss2er STeP_engine _makegaplist dignitor msg_gimv stderr STeP_parser _mask_repeat_for_mapping ecell msg_interface stdin STeP_scripter _oligomer_translation eliminate_atg msg_percent stdout _R_RNA_graph _over_lapping_printer eliminate_pat msg_progress substance_layout _R_base_graph _post_blast_clusterer enc msg_send substance_layout2 _STS_divider_for_STeP _print_tandem enzyme_layout msg_set_gimv test_gpac _STS_modifer_for_STeP _repeatmasker equitability msg_system_console translate _UniMultiGrapher _sdb_path er2eri msg_term_console usage_dist _UniUniGrapher _set_sdb_path fasta_parse oligomer_counter valid_CDS _acc2ftp_bacteria _sim4 file_maker opt_as_gb view_cds _base_printer _sts2pg_for_STeP file_maker_fasta opt_default w_value _blast _translate find_dnaAbox opt_get _blast_db_for_mapping _trf find_identical_gene opt_val _blast_for_mapping _value_printer find_king_of_gene ori_search _blast_tp_finder aa_codon_compiler find_ori_ter output_maker _blastpointer_for_mapping aa_codon_usage find_seq over_lapping_finder _cap3 aaui find_tandem palindrome _clustalw alignment fop pasteseq _codon_amino_printer amino_counter foreach_RNAfold peptide_mass _codon_table amino_info foreach_tandem phx _codon_usage_printer annotate_with_glimmerM form_sim4 plasmid_map _codon_usage_table atcgcon funcD print_gene_function_list _complement base_counter gcskew pseudo_atg _csv_h2v base_entropy gcwin qstat _cutquery_for_mapping base_individual_information_matrix genome_map qsub _distance_cu base_information_content genome_map2 query_strand _ePCR_for_STeP base_relative_entropy genomicskew read_goa _ecell_name2kegg_compound base_z_value gopac redundancy _eri_extracter blast_parse gpac redundancy_cap3
  • 5. Perl API: BioPerl vs G use Bio::SeqIO; $in = Bio::SeqIO->new(-file=>"ecoli.gbk", '-format'=>'GenBank'); $seq = $in->next_seq(); foreach $feat ($seq->all_SeqFeatures()){ next unless($feat->primary_tag eq ‘CDS’); print $feat->each_tag_value(“note”), “¥n”; } use Bio::DB::GenBank; use Bio::Seq; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_acc(“NC_000913”); use G; $gb = load ecoli; # $gb = load(“genbank:NC_000913”); foreach $cds ($gb->cds()){ say $gb->{$cds}->{note}; }
  • 6. Interactive Shell • fully functional Perl shell • basic UNIX commands • mix of the above (weird) • print togoWS(‘NC_000908’) |head -n 10 |wc > out.txt • tab completion (file, functions), history, editing with EMACS key binding • persistent data • logging • search for functions (like wossname in EMBOSS) and reading documentations (like tfm in EMBOSS), both for G-language API and BioPerl classes • database search (NCBI, KEGG, UniProt ... and more) • sequence and data retrieval
  • 7. Web Service Interface - Overview Deveopment supported by BioHackathon 2009 in Okinawa, Japan
  • 8. REST Interface http://rest.g-language.org http://useG.jp 1. Accessing genome flatfile data http://useG.jp/[species]/[gene]/[feature] a. http://useG.jp/ecoli/ - Nucleotide composition of E.coli genome b. http://useG.jp/ecoli/recA - Feature information about recA gene c. http://useG.jp/ecoli/recA/start - Start position of recA gene d. http://useG.jp/ecoli/*/translation - Amino acid sequence of all genes (FASTA) 2. Manipulating genome data http://useG.jp/[species]/[gene]/[method]/[option=value]/... a. http://useG.jp/method_list/gb - List all available methods b. http://useG.jp/NC_000913/*/before_startcodon - Retrieve upstream sequence of all genes 3. Genome sequence analysis http://useG.jp/[species]/[method]/[option=value]/... a. http://useG.jp/method_list/ - List all available methods b. http://useG.jp/mgen/gcskew/window=1000/ - GC skew of M.genitalium with c. http://useG.jp/mgen/gcskew/cumulative=1/output=f/ 1000bpwindows - Get the raw GC skew result as CSV data 4. Other methods (not requiring genome sequence input) http://useG.jp/[method]/[option=value]/... a. http://useG.jp/togoWS/C00001 - Retrieve KEGG C00001 through togoWS b. http://useG.jp/help/gcskew - Show manual for gcskew method
  • 9. AJAX/CGI Interface http://ws.g-language.org/atelier/
  • 10. Bio::Glite - light weight version using the REST service 32kb in size, only requires LWP::UserAgent easy install via “cpan Bio::Glite”
  • 11. SOAP Service http://soap.g-language.org/g-language.wsdl
  • 12. Works with Taverna2 9 example workflows are already available at
  • 13. ISMB Posters Poster U16 Command-line-based integration of online bioinformatics resources Kazuki Oshita, Kazuharu Arakawa, Masaru Tomita Poster U22 G-language Genome Analysis Environment Version 2: Integrated workbench for computational genome sequence analysis Kazuharu Arakawa, Masaru Tomita Poster X023 Automatic layout tool for large-scale metabolic pathway models based on KEGG Atlas and SBML/SBGN Nobuhiro Kido, Nobuaki Kono, Kazuharu Arakawa, Masaru Tomita Poster X031 Pathway Projector: Web-based Zoomable Pathway Browser using KEGG Atlas and Google Maps API Nobuaki Kono, Kazuharu Arakawa, Nobuhiro Kido, Ryu Ogawa, Kazuki Oshita, Keita Ikegami, Satoshi Tamaki, Masaru Tomita
  • 14. Nobuhiro the REST/CGI server deveoloped Kido Kazuki Oshitaserver deveoloped the SOAP Acknowledgements BioHackathon 2009 sponsored by DBCLS and OIST of Japan Yamagata Prefectural Government and Tsuruoka City.