THE COMPLEXITY OF PLANT GENOMESGenome structure, gene functions and beyond Klaas Vandepoele Barcelona, October 10th 2012 D...
OVERVIEW   And then there were many: plant genome sequences   PLAZA: a web-based plant comparative genomics    toolbox  ...
1. OVERVIEW PLANT GENOME SEQUENCING                                  Individual                                  institute...
GENOME ANNOTATION                                                                                                         ...
EXPLOITING GENOME INFORMATION      Centralized infrastructure      Detailed gene catalog per species         Structural...
Gene family analysisGenome analysis                                                >20 tools available                    ...
HOMOLOGOUS GENE FAMILIES                        >780K proteins                        from 25 species                     ...
GENE COLINEARITY & GENOME ORGANIZATION               Chromosome 1                              • Represent chromosomes as ...
GENOMIC PROFILES  pairwise                                                  multiple                   Simillion et al. (2...
IMPROVED SENSITIVITY TO DETECT DEGENERATEGENOMIC HOMOLOGY                        (#homologous segments)                   ...
I-ADHORE 3.0               Speed & memory footprint                     Fostier, … & Vandepoele, Bioinformatics 2011      ...
GENOME-WIDE COLINEARITYZ. mays                    WGDotplot                           O. sativa
MULTI-SPECIES COLINEARITY                            profile
WHOLE-GENOME CIRCULAR DOTPLOT                                  Reference: O. sativa                            Inner circl...
Gene family analysisGenome analysis                              Proost et al., Plant Cell 2009; Van Bel et al., 2012
FUNCTIONAL ANALYSIS           OF SPECIES-SPECIFICGENE DUPLICATES        Species        specific       duplicates Divide in...
FUNCTIONAL ANALYSIS              OF SPECIES-SPECIFICGENE DUPLICATES        Species        specific       duplicates Divide...
FUNCTIONAL ANALYSIS              OF SPECIES-SPECIFICGENE DUPLICATES        Species        specific       duplicates Divide...
CORE HISTONE CLUSTERS IN C. REINHARDTII   Synteny plot                                    Proost et al., Plant Cell 2009
THE QUEST FOR PLANT ORTHOLOGS   Plants are paleopolyploids       Dynamic genome organization       Large fraction of mu...
Source: Y. Van de Peer
GENE DYNAMICS IN THE GREEN LINEAGE Green algae    Brown algae   Land plants                Diatoms
PLANT   GENE FAMILIES, A TALE OF   DUPLICATIONS                                      F-box protein domain gene family
PLAZA INTEGRATIVE ORTHOLOGY VIEWER     •Tree-based orthologs (TROG) inferred using tree reconciliation     •Orthologous ge...
COMPLEX GENE ORTHOLOGY RELATIONSHIPS                 Query species: A. thalianaTarget species
3. PLANT –OMICS SPACE                        Mochida and Shinozaki, 2011
INTEGRATIVE PLANT GENOMICS   Explore genome-wide –omics data sets to study gene function    and regulation       Transcr...
GENE NETWORK ANALYSIS   Features       Integration heterogeneous –omics data sources         Different gene-gene associ...
EXPERIMENTAL ARABIDOPSIS GENE-GENEASSOCIATION DATADatatype          # Genes    # Associations (% unique)       SourcePPI  ...
DELINEATING ARABIDOPSIS GENE MODULES   Transform gene-gene associations in networks and    functional gene modules
CONVERTING STATIC                GENE ASSOCIATIONS INTOFUNCTIONAL EXPRESSION                    MODULES   Classical appro...
EXPRESSION-BASED CLUSTERING   Integrate a priori functional information    during module detection   Semi-supervised clu...
ANALYSIS GENE MODULES
PROPERTIES INPUT – MODULE DATA   40% of the genes in the modules is present in more than one input data type   only 3% o...
MODULE OVERLAPPrimary Data                                  ModulesDatatype        # Genes   # Associations (%   # Genes #...
FUNCTIONAL AND                CIS-REGULATORY COHERENCEOF PLANT MODULES                                                    ...
FUNCTIONAL MODULE REPERTOIRE                http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
CROSS-SPECIES MODULE ANALYSIS                        Affymetrix GeneChip                   NCBI Gene Expression Omnibus156...
CONSERVED MODULE EXPRESSION COHERENCE                                                         Lipid biosynthesis   58% of...
MODULE-BASED                        FUNCTION PREDICTIONS   Can we recover new experimental Arabidopsis gene – GO BP    an...
DNA ENDOREDUPLICATION                    •   PPI module: predicted to be                        involved in DNA endoredupl...
4. INTEGRATED CO-EXPRESSION – ORTHOLOGY   NETWORKSMovahedi et al., 2012
3-WAY SPECIES CO-EXPRESSION   COMPARISON FOR                 ETG1                                            •   Conserved...
SORTING OUT PLANT (CO-)ORTHOLOGS   USINGEXPRESSION CONTEXT CONSERVATION                                     Protein integr...
4. CONCLUSIONS   Need for advanced & user-friendly tools to characterize new genomes       Complexity and quality genome...
ACKNOWLEDGEMENTS   Ken Heyndrickx   Michiel Van Bel   Sebastian Proost   Sara Movahedi   Mauricio Quimbaya
ACKNOWLEDGEMENTSFurther reading   Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and  ...
The complexity of plant genomes
Upcoming SlideShare
Loading in...5
×

The complexity of plant genomes

718

Published on

B-DEBATE - THE FUTURE OF PLANT GENOMES. HARVESTING GENES FOR AGRICULTURE; 9-11 October 2012, CRAG Barcelona

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
718
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The complexity of plant genomes

  1. 1. THE COMPLEXITY OF PLANT GENOMESGenome structure, gene functions and beyond Klaas Vandepoele Barcelona, October 10th 2012 Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics
  2. 2. OVERVIEW And then there were many: plant genome sequences PLAZA: a web-based plant comparative genomics toolbox  Genome organization and evolution  The quest for plant orthologous genes Unravelling gene functions using integrative plant genomics Cross-species gene function analysis
  3. 3. 1. OVERVIEW PLANT GENOME SEQUENCING Individual institutes International consortia Today: ~40 (complete) plant genome sequences
  4. 4. GENOME ANNOTATION Functional Annotated Genoscope BGI JGI EST genes Genomic DNA Sequences Downstream analysis Artemis Manual GenomeView Curation Coding potential Repeats search Training set Intron potential Build splice IMM SpliceMachine search Site models Repeat Mask Intergenicpotential search Automatic Mask Eugene repeats annotation GenomeView Bogas tBlastx Blastx Blastn Expert Structural annotation annotated genes Related Swissprot EST genomes Nr_prot cDNA Gene Ontology Functional annotation InterPro Predicted genes Source: P. Rouzé
  5. 5. EXPLOITING GENOME INFORMATION  Centralized infrastructure  Detailed gene catalog per species  Structural annotation (gene models, UTRs)  Functional annotation (experimental, sequence-based, systems biology)  Intuitive & advanced data mining tools for non-expert users  Gene function  Genome organization  Pathway evolution  Data manipulation  Computational resources
  6. 6. Gene family analysisGenome analysis >20 tools available Proost et al., Plant Cell 2009; Van Bel et al., 2012
  7. 7. HOMOLOGOUS GENE FAMILIES >780K proteins from 25 species Protein clustering Phylogenetics 18K trees incl. 420K 22K multi-species gene familiesannotated tree nodes covering 83% of the total proteome
  8. 8. GENE COLINEARITY & GENOME ORGANIZATION Chromosome 1 • Represent chromosomes as sorted gene listsChromosome 2 • Identify all homologous gene pairs between chromosomes (all- against-all BLASTP). • Score pairs of homologues in matrix 1Gene Homology Matrix (GHM)i-ADHoRe 3.0 2
  9. 9. GENOMIC PROFILES pairwise multiple Simillion et al. (2004) Genome Res. 14, 1095-1106
  10. 10. IMPROVED SENSITIVITY TO DETECT DEGENERATEGENOMIC HOMOLOGY (#homologous segments) Proost, Fostier … & Vandepoele, NAR 2011
  11. 11. I-ADHORE 3.0 Speed & memory footprint Fostier, … & Vandepoele, Bioinformatics 2011 Proost, Fostier … & Vandepoele, NAR 2011
  12. 12. GENOME-WIDE COLINEARITYZ. mays WGDotplot O. sativa
  13. 13. MULTI-SPECIES COLINEARITY profile
  14. 14. WHOLE-GENOME CIRCULAR DOTPLOT Reference: O. sativa Inner circle: duplicated regions Outer circle: inter-species colinear regions
  15. 15. Gene family analysisGenome analysis Proost et al., Plant Cell 2009; Van Bel et al., 2012
  16. 16. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets PLAZA workbench GO enrichment Proost et al., Plant Cell 2009
  17. 17. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  18. 18. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  19. 19. CORE HISTONE CLUSTERS IN C. REINHARDTII Synteny plot Proost et al., Plant Cell 2009
  20. 20. THE QUEST FOR PLANT ORTHOLOGS Plants are paleopolyploids  Dynamic genome organization  Large fraction of multi-gene families  Absence of simple 1:1 orthology relationships
  21. 21. Source: Y. Van de Peer
  22. 22. GENE DYNAMICS IN THE GREEN LINEAGE Green algae Brown algae Land plants Diatoms
  23. 23. PLANT GENE FAMILIES, A TALE OF DUPLICATIONS F-box protein domain gene family
  24. 24. PLAZA INTEGRATIVE ORTHOLOGY VIEWER •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species Van Bel et al., •Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
  25. 25. COMPLEX GENE ORTHOLOGY RELATIONSHIPS Query species: A. thalianaTarget species
  26. 26. 3. PLANT –OMICS SPACE Mochida and Shinozaki, 2011
  27. 27. INTEGRATIVE PLANT GENOMICS Explore genome-wide –omics data sets to study gene function and regulation  Transcriptomics (Microarrays|RNA-Seq)  Interactome data (Y2H|TAP)  Regulatory interactions (TF|miRNA-target|TF motifs) Include expert gene annotations  Dedicated databases (e.g. phenotypes, metabolomics)  Text-mining
  28. 28. GENE NETWORK ANALYSIS Features  Integration heterogeneous –omics data sources  Different gene-gene associations with varying quality  Missing data  Exploit network-guided guilt-by-association principle  Methodologies  Simple un-weighted/weighted graphs  Probabilistic models Lee et al., 2010
  29. 29. EXPERIMENTAL ARABIDOPSIS GENE-GENEASSOCIATION DATADatatype # Genes # Associations (% unique) SourcePPI 3,194 7,210 (75%) CORNETAraNet* 19,647 1,062,222 (99%) Lee et al., 2010TF targets 9,422 13,037 (99%) AtRegNet (AGRIS)GO 6,588 89,100 (n.a.) GeneOntology.org / TAIRTotal 22,492 1,089,661 * Probabilistic network integrating heterogeneous genomic featuresResearch objectives: • Infer functional gene modules starting from experimental data • Identify regulatory properties of genes, modules and network • Explore cross-species functional annotation Heyndrickx and Vandepoele, 2012
  30. 30. DELINEATING ARABIDOPSIS GENE MODULES Transform gene-gene associations in networks and functional gene modules
  31. 31. CONVERTING STATIC GENE ASSOCIATIONS INTOFUNCTIONAL EXPRESSION MODULES Classical approach 1. Clustering expression data • Guide-gene (gene-centric) • Non-targeted (global) 2. Functional analysis modules using enrichment statistic Challenges - weaknesses  Which microarray samples to include?  Functional information integrated a posteriori Aoki et al., 2007
  32. 32. EXPRESSION-BASED CLUSTERING Integrate a priori functional information during module detection Semi-supervised clustering strategy considering multiple query genes and multiple expression compendia Rank aggregation through scoring function  maximize coexpression towards multiple seeds showing dynamic expression profile
  33. 33. ANALYSIS GENE MODULES
  34. 34. PROPERTIES INPUT – MODULE DATA 40% of the genes in the modules is present in more than one input data type only 3% of the gene pairs within a module having support by more than one primary data type
  35. 35. MODULE OVERLAPPrimary Data ModulesDatatype # Genes # Associations (% # Genes # Modules Functional Motif unique) (% unique) Enrichment EnrichmentPPI 3,194 7,210 (75%) 597 72 (95%) 51 43AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341Total 22,492 1,089,661 13,428 2,114 1,161Non-redundant 13,142 1,563 676 772Modules >99% modules found through a single input data type
  36. 36. FUNCTIONAL AND CIS-REGULATORY COHERENCEOF PLANT MODULES Cis-regulatory element analysis • Weeder / MotifSampler de novo motif finding (1544 motifs) • Overlap with known plant motifs AGRIS/PLACE (34%) Functional enrichment analysis • Over-representation hypergeometric distribution + FDR • Non-electronic GO annotations + embryo-lethal gene (SeedGenes) 40% of the modules could be linked to a significant functional enrichment (GO BP - embryo lethality) 98% of the modules have 1 (or more) gene(s) with a known experimental annotation
  37. 37. FUNCTIONAL MODULE REPERTOIRE http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  38. 38. CROSS-SPECIES MODULE ANALYSIS Affymetrix GeneChip NCBI Gene Expression Omnibus1563 Arabidopsis Integrative modules orthology
  39. 39. CONSERVED MODULE EXPRESSION COHERENCE Lipid biosynthesis 58% of modules shows significant coexpression coherence (3 or more species) >43,000 unknown genes from 6 other plants receive module-based functional annotations
  40. 40. MODULE-BASED FUNCTION PREDICTIONS Can we recover new experimental Arabidopsis gene – GO BP annotations? Data freeze Evaluation 1460 Arabidopsis genes with predictions receive new exp. GO-BP Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # ConfAll Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
  41. 41. DNA ENDOREDUPLICATION • PPI module: predicted to be involved in DNA endoreduplication • Experimental validation shows that AT1G06590 T-DNA shows perturbed endoreduplication index (Quimbaya et al., 2012) Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  42. 42. 4. INTEGRATED CO-EXPRESSION – ORTHOLOGY NETWORKSMovahedi et al., 2012
  43. 43. 3-WAY SPECIES CO-EXPRESSION COMPARISON FOR ETG1 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Role in sister chromatin cohesion Movahedi et al., 2012; Takahashi et al., 2010
  44. 44. SORTING OUT PLANT (CO-)ORTHOLOGS USINGEXPRESSION CONTEXT CONSERVATION Protein integrative orthology Expression Context Conservation scores (p-value < 0.05) Inparalogs (species-specific duplicates)
  45. 45. 4. CONCLUSIONS Need for advanced & user-friendly tools to characterize new genomes  Complexity and quality genome sequences  Scalability with increasing number of genomes Integrative approaches combining multiple methods outperform individual methods* and provide users a more complete view  Computer power  Visualization Large discrepancy in the functional gene associations between the different experimental data sets A large fraction of the module-based functional predictions are biologically valid and can be transferred across species Comparative network approaches provide a powerful tool to integrate functional genomics data * Quest for Orthologs Consortium, Bioinformatics 2012
  46. 46. ACKNOWLEDGEMENTS Ken Heyndrickx Michiel Van Bel Sebastian Proost Sara Movahedi Mauricio Quimbaya
  47. 47. ACKNOWLEDGEMENTSFurther reading Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol. Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co- expression analysis in plant biology. Plant, Cell & Environment

×