Brief introduction of aLeaves


Published on

Slides from an oral presentation by Shigehiro Kuraku given in an internal event 'Sequence Informatics Afternoon' organized by Genome Resource and Analysis Unit of RIKEN CDB in April 2014.

Published in: Science, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Brief introduction of aLeaves

  1. 1. Shigehiro Kuraku Unit Leader Genome Resource & Analysis Unit, RIKEN CDB The extended version of this presentation as well as its Japanese version is available at SlideShare ( ) aLeaves: web server ( for handy phylogenetic analysis
  2. 2. Tutorial movies available Powered by “Collecting amino acid sequences and building a phylogenetic tree on the aLeaves and MAFFT servers” 「aLeavesとMAFFTを使って1つのアミノ酸配列 から系統樹を推定する」
  3. 3. Motivation of aLeaves development While we have access to various methods for molecular phylogenetic tree inference and enriched sequence data from large-scale sequencing projects, phylogenetic tree building is not handy but rather cumbersome for biologists working in labs. Launch an online tool which performs comprehensive sequence searches covering scattered large-scale resources and systematic data slimming using biologist-friendly cues. Background
  4. 4. What is hidden paralogy ? ex) zebrafish Emx3 Derobert et al., 2002 etc. Morita et al., 1995 Reviewed in Kuraku, 2010. Integ. Comp. Biol.
  5. 5. What is hidden paralogy ? ex) zebrafish Emx3 Derobert et al., 2002 etc. Morita et al., 1995 Reviewed in Kuraku, 2010. Integ. Comp. Biol.
  6. 6. Heuristic collection B) A) Exhaustive search of homologs How do you prepare a homolog set?
  7. 7. Using BLAST server at NCBI “Every BLAST search is an experiment” by
  8. 8. Scattered information prevents our smooth work EnsemblNCBI Protein (annotated) Individual web sites of genome projects Your sequences NCBI Refseq (annotated) Ensembl Metazoa Dataset
  9. 9. Collaborators GRAS, RIKEN CDB CBRC, AIST & iFReC, Osaka Univ. Christian M. Zmasek Sanford-Burnham Medical Research Institute USA Kazutaka KatohOsamu Nishimura
  10. 10. aLeaves – Output a multi-fasta sequence file in several minutes A single search to cover diverse species Enter a query sequence in a peptide
  11. 11. Taxonomic coverage (1)
  12. 12. Taxonomic coverage (2)
  13. 13. Downstream analysis on MAFFT server Systematic selection/deletion of seqs based on various criteria ・Sequence length filter ・Delete identical/similar sequences (CD-HIT) ・Delete sequences with large gaps (Max-Align) ・Select only particular species ・Select/delete particular subgroups in a guide-tree Managed by K. Katoh
  14. 14. Heuristic identification of homologs (in publications, etc.) Exhaustive collection of homologs Careful refinement of data set by deleting unnecessary sequences Phylogenetic tree inference Retrieval of limited number of sequences (on MAFFT server at CBRC, AIST) (on aLeaves server at CDB, RIKEN) Workflow using aLeaves-MAFFT
  15. 15. Warning ・aLeaves is based on sequence resources already made public in other online databases and does not release original sequence information. ・aLeaves project does not predict and validate protein coding sequences available at other web sites and just adopt them for integrative searches. ・aLeaves-MAFFT link allows you to perform sequence data set refinement and preliminary molecular phylogenetic analysis, but please perform more sophisticated analyses on your local system by downloading the data set.
  16. 16. Citing aLeaves