Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform


Published on

Dissecting plant genomes with the PLAZA comparative genomics platform.
Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K.

Plant Physiol. 2012 Feb;158(2):590-600.

With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.

  • Be the first to comment

Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform

  1. 1. Dissecting plant genomes with thePLAZA 2.5 comparative genomicsplatformIntegrating sequence orthology with expression data topredict functional homologs across plant speciesKlaas VandepoelePLANT GENOMES & BIOTECHNOLOGY: FROM GENES TONETWORKS (CSHL, 1 December 2011)Comparative & Integrative GenomicsVIB – Ghent University, Belgium
  2. 2. Genome sequencing in different plant clades 1.0 2.0 2.5 Green algae Chlorophyceae C. reinhardtii V. carteri Prasinophyceae O. lucimarinus Micromomas O. tauri Club-mosses P. patens S. moellondorffii Mosses Monocots O. sativa japonica O. sativa indica S. bicolor Z. mays B. distachon Basal Eudicots V. vinifera L. japonics, M. truncatula, G. max Eudicots Angiosperms P. trichocarpa M. esculenta, R. communis, F. vesca Rosids C. papaya M. domestica, T. cacao A. thaliana A. lyrata Asterids 9 genomes 25 genomes2
  3. 3. Exploiting cross-species genome information  Centralized infrastructure  Detailed gene catalog per species  Structural annotation (gene models, UTRs)  Functional annotation (experimental, sequence-based)  Intuitive & advanced data mining tools for non-expert users • Gene function • Genome organization • Pathway evolution • Data manipulation  Computational resources3
  4. 4. PLAZA, a resource for plant comparative genomics
  5. 5. Gene family analysis Genome analysis 20 tools available! More information? Check Help – Documentation • Data content & Construction • Tutorial & FAQ Proost , Van Bel, … & Vandepoele, Plant Cell 20095
  6. 6. 6
  7. 7. Comparative sequence analysis  Homology = shared ancestral common origin  Inferred based on  sequence similarity (BLAST)  similar (multi-)domain composition & organization  So sequence similarity means homology? No, it depends! JGI TAIR All-against-all sequence BLASTCLUST similarity search (BLAST) Tribe-MCL EMBL Inparanoid OrthoMCL C/KOG7
  8. 8. Gene family Similarity heatmap, Multiple sequence alignment & Phylogenetic trees >780K proteins from 25 species 18K trees incl. 420K 22K multi-species gene families annotated tree nodes covering 83% of the total proteome8
  9. 9. Gene family analysis Genome analysis9
  10. 10. Gene colinearity & genome organization Chromosome 1 • Represent chromosomes as sorted gene lists Chromosome 2 • Identify all homologous gene pairs between chromosomes (all- against-all BLASTP). • Score pairs of homologues in matrix 1 Gene Homology Matrix (GHM) i-ADHoRe 3.0 210 Proost , Fostier, … & Vandepoele, NAR in press
  11. 11. Genome-wide colinearity (WGDotplot) Z. mays O. sativa11
  12. 12. Multi-species colinearity12
  13. 13. Multi-species WGDotplots - applet13
  14. 14. Whole-genome Circular Dotplot Reference: O. sativa Inner circle: duplicated regions14 Outer circle: inter-species colinear regions
  15. 15. Synteny Plot: local genome organization15
  16. 16. Gene family analysis Genome analysis16
  17. 17. Workbench data import  Create a custom gene set (~experiment) using gene identifiers or BLAST  External/internal gene IDs (e.g. AN3, AT5G28640, GRMZM2G180246_T01)  BLAST interface can be used to map sequence data from a non-model species to a reference species present in PLAZA  A toolbox is available to analyze user-defined gene sets Microarray transcript profiling WGMapping Gene Families EST Functional PLAZA GO enrichment sequencing annotations Workbench Sequence Tandem/block retrieval duplicates Genes reported in Suppl. data Orthologs Export data…17
  18. 18. GO enrichment analysis for all 25 species!18
  19. 19. Detection of orthologous plant genes  Meaning…  Orthology = genes derived from a common ancestor in different species  Functionally conserved homologs = genes in different species having similar functions  Due to gene duplication events , complex many-to-many gene orthology is frequently observed  Functional homologs in different species share …  similar expression?  regulation?19  protein-protein interactions?
  20. 20. Orthologous genes – Table view20
  21. 21. Integrative Orthology Viewer - an ensemble of different gene orthology prediction approaches •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species21 •Best hit families (BHIF) inferred from Blast hits including inparalogs
  22. 22. How to evaluate sequence-based orthology methods? Cross-species analysis of orthologs using Expression Context Conservation (ECC) Expression context conservation quantifies shared orthologs in coexpression networks ECC score = 0.088 (16 shared orthologs / 182 in both coexpression clusters) P-value(conserved)<0.00122 Movahedi, Van de Peer & Vandepoele, Plant Physiology 2011
  23. 23. Orthology support & expression conservation for Arabidopsis – rice orthologs OrthoMCL (60% ECC global) BHIF (58% ECC global) 3888 2880 4196 5561 8364 60 % 6699 44% 41% 8869 2338 9411 68 % 3875 3022 4281 57% 41% 5886 16367 Legend 41% # Ath genes # Ath – Osa gene pairs TROG (54% ECC global) % Expression conservation (ECC) >3506 Arabidopsis – rice orthologs missed by OrthoMCL show23 expression conservation (41% ECC)
  24. 24. Conclusions  PLAZA 2.5 provides a versatile toolbox for plant genomics  Expression Context Conservation provides a valuable approach to study orthologs and predict functional homologs across species  The integration of complementary data types extends the scope of complex orthology relationships24
  25. 25. Acknowledgments • – plant comparative genomics  Michiel Van Bel  Sebastian Proost  Yves Van de Peer  Evolutionary analysis of expression networks  Sara Movahedi Plant Physiology 2011 paper25