The complexity of plant genomes

1,600 views

Published on

B-DEBATE - THE FUTURE OF PLANT GENOMES. HARVESTING GENES FOR AGRICULTURE; 9-11 October 2012, CRAG Barcelona

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,600
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

The complexity of plant genomes

  1. 1. THE COMPLEXITY OF PLANT GENOMESGenome structure, gene functions and beyond Klaas Vandepoele Barcelona, October 10th 2012 Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics
  2. 2. OVERVIEW And then there were many: plant genome sequences PLAZA: a web-based plant comparative genomics toolbox  Genome organization and evolution  The quest for plant orthologous genes Unravelling gene functions using integrative plant genomics Cross-species gene function analysis
  3. 3. 1. OVERVIEW PLANT GENOME SEQUENCING Individual institutes International consortia Today: ~40 (complete) plant genome sequences
  4. 4. GENOME ANNOTATION Functional Annotated Genoscope BGI JGI EST genes Genomic DNA Sequences Downstream analysis Artemis Manual GenomeView Curation Coding potential Repeats search Training set Intron potential Build splice IMM SpliceMachine search Site models Repeat Mask Intergenicpotential search Automatic Mask Eugene repeats annotation GenomeView Bogas tBlastx Blastx Blastn Expert Structural annotation annotated genes Related Swissprot EST genomes Nr_prot cDNA Gene Ontology Functional annotation InterPro Predicted genes Source: P. Rouzé
  5. 5. EXPLOITING GENOME INFORMATION  Centralized infrastructure  Detailed gene catalog per species  Structural annotation (gene models, UTRs)  Functional annotation (experimental, sequence-based, systems biology)  Intuitive & advanced data mining tools for non-expert users  Gene function  Genome organization  Pathway evolution  Data manipulation  Computational resources
  6. 6. Gene family analysisGenome analysis >20 tools available Proost et al., Plant Cell 2009; Van Bel et al., 2012
  7. 7. HOMOLOGOUS GENE FAMILIES >780K proteins from 25 species Protein clustering Phylogenetics 18K trees incl. 420K 22K multi-species gene familiesannotated tree nodes covering 83% of the total proteome
  8. 8. GENE COLINEARITY & GENOME ORGANIZATION Chromosome 1 • Represent chromosomes as sorted gene listsChromosome 2 • Identify all homologous gene pairs between chromosomes (all- against-all BLASTP). • Score pairs of homologues in matrix 1Gene Homology Matrix (GHM)i-ADHoRe 3.0 2
  9. 9. GENOMIC PROFILES pairwise multiple Simillion et al. (2004) Genome Res. 14, 1095-1106
  10. 10. IMPROVED SENSITIVITY TO DETECT DEGENERATEGENOMIC HOMOLOGY (#homologous segments) Proost, Fostier … & Vandepoele, NAR 2011
  11. 11. I-ADHORE 3.0 Speed & memory footprint Fostier, … & Vandepoele, Bioinformatics 2011 Proost, Fostier … & Vandepoele, NAR 2011
  12. 12. GENOME-WIDE COLINEARITYZ. mays WGDotplot O. sativa
  13. 13. MULTI-SPECIES COLINEARITY profile
  14. 14. WHOLE-GENOME CIRCULAR DOTPLOT Reference: O. sativa Inner circle: duplicated regions Outer circle: inter-species colinear regions
  15. 15. Gene family analysisGenome analysis Proost et al., Plant Cell 2009; Van Bel et al., 2012
  16. 16. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets PLAZA workbench GO enrichment Proost et al., Plant Cell 2009
  17. 17. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  18. 18. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  19. 19. CORE HISTONE CLUSTERS IN C. REINHARDTII Synteny plot Proost et al., Plant Cell 2009
  20. 20. THE QUEST FOR PLANT ORTHOLOGS Plants are paleopolyploids  Dynamic genome organization  Large fraction of multi-gene families  Absence of simple 1:1 orthology relationships
  21. 21. Source: Y. Van de Peer
  22. 22. GENE DYNAMICS IN THE GREEN LINEAGE Green algae Brown algae Land plants Diatoms
  23. 23. PLANT GENE FAMILIES, A TALE OF DUPLICATIONS F-box protein domain gene family
  24. 24. PLAZA INTEGRATIVE ORTHOLOGY VIEWER •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species Van Bel et al., •Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
  25. 25. COMPLEX GENE ORTHOLOGY RELATIONSHIPS Query species: A. thalianaTarget species
  26. 26. 3. PLANT –OMICS SPACE Mochida and Shinozaki, 2011
  27. 27. INTEGRATIVE PLANT GENOMICS Explore genome-wide –omics data sets to study gene function and regulation  Transcriptomics (Microarrays|RNA-Seq)  Interactome data (Y2H|TAP)  Regulatory interactions (TF|miRNA-target|TF motifs) Include expert gene annotations  Dedicated databases (e.g. phenotypes, metabolomics)  Text-mining
  28. 28. GENE NETWORK ANALYSIS Features  Integration heterogeneous –omics data sources  Different gene-gene associations with varying quality  Missing data  Exploit network-guided guilt-by-association principle  Methodologies  Simple un-weighted/weighted graphs  Probabilistic models Lee et al., 2010
  29. 29. EXPERIMENTAL ARABIDOPSIS GENE-GENEASSOCIATION DATADatatype # Genes # Associations (% unique) SourcePPI 3,194 7,210 (75%) CORNETAraNet* 19,647 1,062,222 (99%) Lee et al., 2010TF targets 9,422 13,037 (99%) AtRegNet (AGRIS)GO 6,588 89,100 (n.a.) GeneOntology.org / TAIRTotal 22,492 1,089,661 * Probabilistic network integrating heterogeneous genomic featuresResearch objectives: • Infer functional gene modules starting from experimental data • Identify regulatory properties of genes, modules and network • Explore cross-species functional annotation Heyndrickx and Vandepoele, 2012
  30. 30. DELINEATING ARABIDOPSIS GENE MODULES Transform gene-gene associations in networks and functional gene modules
  31. 31. CONVERTING STATIC GENE ASSOCIATIONS INTOFUNCTIONAL EXPRESSION MODULES Classical approach 1. Clustering expression data • Guide-gene (gene-centric) • Non-targeted (global) 2. Functional analysis modules using enrichment statistic Challenges - weaknesses  Which microarray samples to include?  Functional information integrated a posteriori Aoki et al., 2007
  32. 32. EXPRESSION-BASED CLUSTERING Integrate a priori functional information during module detection Semi-supervised clustering strategy considering multiple query genes and multiple expression compendia Rank aggregation through scoring function  maximize coexpression towards multiple seeds showing dynamic expression profile
  33. 33. ANALYSIS GENE MODULES
  34. 34. PROPERTIES INPUT – MODULE DATA 40% of the genes in the modules is present in more than one input data type only 3% of the gene pairs within a module having support by more than one primary data type
  35. 35. MODULE OVERLAPPrimary Data ModulesDatatype # Genes # Associations (% # Genes # Modules Functional Motif unique) (% unique) Enrichment EnrichmentPPI 3,194 7,210 (75%) 597 72 (95%) 51 43AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341Total 22,492 1,089,661 13,428 2,114 1,161Non-redundant 13,142 1,563 676 772Modules >99% modules found through a single input data type
  36. 36. FUNCTIONAL AND CIS-REGULATORY COHERENCEOF PLANT MODULES Cis-regulatory element analysis • Weeder / MotifSampler de novo motif finding (1544 motifs) • Overlap with known plant motifs AGRIS/PLACE (34%) Functional enrichment analysis • Over-representation hypergeometric distribution + FDR • Non-electronic GO annotations + embryo-lethal gene (SeedGenes) 40% of the modules could be linked to a significant functional enrichment (GO BP - embryo lethality) 98% of the modules have 1 (or more) gene(s) with a known experimental annotation
  37. 37. FUNCTIONAL MODULE REPERTOIRE http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  38. 38. CROSS-SPECIES MODULE ANALYSIS Affymetrix GeneChip NCBI Gene Expression Omnibus1563 Arabidopsis Integrative modules orthology
  39. 39. CONSERVED MODULE EXPRESSION COHERENCE Lipid biosynthesis 58% of modules shows significant coexpression coherence (3 or more species) >43,000 unknown genes from 6 other plants receive module-based functional annotations
  40. 40. MODULE-BASED FUNCTION PREDICTIONS Can we recover new experimental Arabidopsis gene – GO BP annotations? Data freeze Evaluation 1460 Arabidopsis genes with predictions receive new exp. GO-BP Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # ConfAll Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
  41. 41. DNA ENDOREDUPLICATION • PPI module: predicted to be involved in DNA endoreduplication • Experimental validation shows that AT1G06590 T-DNA shows perturbed endoreduplication index (Quimbaya et al., 2012) Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  42. 42. 4. INTEGRATED CO-EXPRESSION – ORTHOLOGY NETWORKSMovahedi et al., 2012
  43. 43. 3-WAY SPECIES CO-EXPRESSION COMPARISON FOR ETG1 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Role in sister chromatin cohesion Movahedi et al., 2012; Takahashi et al., 2010
  44. 44. SORTING OUT PLANT (CO-)ORTHOLOGS USINGEXPRESSION CONTEXT CONSERVATION Protein integrative orthology Expression Context Conservation scores (p-value < 0.05) Inparalogs (species-specific duplicates)
  45. 45. 4. CONCLUSIONS Need for advanced & user-friendly tools to characterize new genomes  Complexity and quality genome sequences  Scalability with increasing number of genomes Integrative approaches combining multiple methods outperform individual methods* and provide users a more complete view  Computer power  Visualization Large discrepancy in the functional gene associations between the different experimental data sets A large fraction of the module-based functional predictions are biologically valid and can be transferred across species Comparative network approaches provide a powerful tool to integrate functional genomics data * Quest for Orthologs Consortium, Bioinformatics 2012
  46. 46. ACKNOWLEDGEMENTS Ken Heyndrickx Michiel Van Bel Sebastian Proost Sara Movahedi Mauricio Quimbaya
  47. 47. ACKNOWLEDGEMENTSFurther reading Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol. Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co- expression analysis in plant biology. Plant, Cell & Environment

×