• Save
The complexity of plant genomes
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The complexity of plant genomes

on

  • 712 views

B-DEBATE - THE FUTURE OF PLANT GENOMES. HARVESTING GENES FOR AGRICULTURE; 9-11 October 2012, CRAG Barcelona

B-DEBATE - THE FUTURE OF PLANT GENOMES. HARVESTING GENES FOR AGRICULTURE; 9-11 October 2012, CRAG Barcelona

Statistics

Views

Total Views
712
Views on SlideShare
707
Embed Views
5

Actions

Likes
1
Downloads
0
Comments
0

3 Embeds 5

https://twitter.com 3
https://si0.twimg.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The complexity of plant genomes Presentation Transcript

  • 1. THE COMPLEXITY OF PLANT GENOMESGenome structure, gene functions and beyond Klaas Vandepoele Barcelona, October 10th 2012 Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics
  • 2. OVERVIEW And then there were many: plant genome sequences PLAZA: a web-based plant comparative genomics toolbox  Genome organization and evolution  The quest for plant orthologous genes Unravelling gene functions using integrative plant genomics Cross-species gene function analysis
  • 3. 1. OVERVIEW PLANT GENOME SEQUENCING Individual institutes International consortia Today: ~40 (complete) plant genome sequences
  • 4. GENOME ANNOTATION Functional Annotated Genoscope BGI JGI EST genes Genomic DNA Sequences Downstream analysis Artemis Manual GenomeView Curation Coding potential Repeats search Training set Intron potential Build splice IMM SpliceMachine search Site models Repeat Mask Intergenicpotential search Automatic Mask Eugene repeats annotation GenomeView Bogas tBlastx Blastx Blastn Expert Structural annotation annotated genes Related Swissprot EST genomes Nr_prot cDNA Gene Ontology Functional annotation InterPro Predicted genes Source: P. Rouzé
  • 5. EXPLOITING GENOME INFORMATION  Centralized infrastructure  Detailed gene catalog per species  Structural annotation (gene models, UTRs)  Functional annotation (experimental, sequence-based, systems biology)  Intuitive & advanced data mining tools for non-expert users  Gene function  Genome organization  Pathway evolution  Data manipulation  Computational resources
  • 6. Gene family analysisGenome analysis >20 tools available Proost et al., Plant Cell 2009; Van Bel et al., 2012
  • 7. HOMOLOGOUS GENE FAMILIES >780K proteins from 25 species Protein clustering Phylogenetics 18K trees incl. 420K 22K multi-species gene familiesannotated tree nodes covering 83% of the total proteome
  • 8. GENE COLINEARITY & GENOME ORGANIZATION Chromosome 1 • Represent chromosomes as sorted gene listsChromosome 2 • Identify all homologous gene pairs between chromosomes (all- against-all BLASTP). • Score pairs of homologues in matrix 1Gene Homology Matrix (GHM)i-ADHoRe 3.0 2
  • 9. GENOMIC PROFILES pairwise multiple Simillion et al. (2004) Genome Res. 14, 1095-1106
  • 10. IMPROVED SENSITIVITY TO DETECT DEGENERATEGENOMIC HOMOLOGY (#homologous segments) Proost, Fostier … & Vandepoele, NAR 2011
  • 11. I-ADHORE 3.0 Speed & memory footprint Fostier, … & Vandepoele, Bioinformatics 2011 Proost, Fostier … & Vandepoele, NAR 2011
  • 12. GENOME-WIDE COLINEARITYZ. mays WGDotplot O. sativa
  • 13. MULTI-SPECIES COLINEARITY profile
  • 14. WHOLE-GENOME CIRCULAR DOTPLOT Reference: O. sativa Inner circle: duplicated regions Outer circle: inter-species colinear regions
  • 15. Gene family analysisGenome analysis Proost et al., Plant Cell 2009; Van Bel et al., 2012
  • 16. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets PLAZA workbench GO enrichment Proost et al., Plant Cell 2009
  • 17. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  • 18. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFICGENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  • 19. CORE HISTONE CLUSTERS IN C. REINHARDTII Synteny plot Proost et al., Plant Cell 2009
  • 20. THE QUEST FOR PLANT ORTHOLOGS Plants are paleopolyploids  Dynamic genome organization  Large fraction of multi-gene families  Absence of simple 1:1 orthology relationships
  • 21. Source: Y. Van de Peer
  • 22. GENE DYNAMICS IN THE GREEN LINEAGE Green algae Brown algae Land plants Diatoms
  • 23. PLANT GENE FAMILIES, A TALE OF DUPLICATIONS F-box protein domain gene family
  • 24. PLAZA INTEGRATIVE ORTHOLOGY VIEWER •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species Van Bel et al., •Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
  • 25. COMPLEX GENE ORTHOLOGY RELATIONSHIPS Query species: A. thalianaTarget species
  • 26. 3. PLANT –OMICS SPACE Mochida and Shinozaki, 2011
  • 27. INTEGRATIVE PLANT GENOMICS Explore genome-wide –omics data sets to study gene function and regulation  Transcriptomics (Microarrays|RNA-Seq)  Interactome data (Y2H|TAP)  Regulatory interactions (TF|miRNA-target|TF motifs) Include expert gene annotations  Dedicated databases (e.g. phenotypes, metabolomics)  Text-mining
  • 28. GENE NETWORK ANALYSIS Features  Integration heterogeneous –omics data sources  Different gene-gene associations with varying quality  Missing data  Exploit network-guided guilt-by-association principle  Methodologies  Simple un-weighted/weighted graphs  Probabilistic models Lee et al., 2010
  • 29. EXPERIMENTAL ARABIDOPSIS GENE-GENEASSOCIATION DATADatatype # Genes # Associations (% unique) SourcePPI 3,194 7,210 (75%) CORNETAraNet* 19,647 1,062,222 (99%) Lee et al., 2010TF targets 9,422 13,037 (99%) AtRegNet (AGRIS)GO 6,588 89,100 (n.a.) GeneOntology.org / TAIRTotal 22,492 1,089,661 * Probabilistic network integrating heterogeneous genomic featuresResearch objectives: • Infer functional gene modules starting from experimental data • Identify regulatory properties of genes, modules and network • Explore cross-species functional annotation Heyndrickx and Vandepoele, 2012
  • 30. DELINEATING ARABIDOPSIS GENE MODULES Transform gene-gene associations in networks and functional gene modules
  • 31. CONVERTING STATIC GENE ASSOCIATIONS INTOFUNCTIONAL EXPRESSION MODULES Classical approach 1. Clustering expression data • Guide-gene (gene-centric) • Non-targeted (global) 2. Functional analysis modules using enrichment statistic Challenges - weaknesses  Which microarray samples to include?  Functional information integrated a posteriori Aoki et al., 2007
  • 32. EXPRESSION-BASED CLUSTERING Integrate a priori functional information during module detection Semi-supervised clustering strategy considering multiple query genes and multiple expression compendia Rank aggregation through scoring function  maximize coexpression towards multiple seeds showing dynamic expression profile
  • 33. ANALYSIS GENE MODULES
  • 34. PROPERTIES INPUT – MODULE DATA 40% of the genes in the modules is present in more than one input data type only 3% of the gene pairs within a module having support by more than one primary data type
  • 35. MODULE OVERLAPPrimary Data ModulesDatatype # Genes # Associations (% # Genes # Modules Functional Motif unique) (% unique) Enrichment EnrichmentPPI 3,194 7,210 (75%) 597 72 (95%) 51 43AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341Total 22,492 1,089,661 13,428 2,114 1,161Non-redundant 13,142 1,563 676 772Modules >99% modules found through a single input data type
  • 36. FUNCTIONAL AND CIS-REGULATORY COHERENCEOF PLANT MODULES Cis-regulatory element analysis • Weeder / MotifSampler de novo motif finding (1544 motifs) • Overlap with known plant motifs AGRIS/PLACE (34%) Functional enrichment analysis • Over-representation hypergeometric distribution + FDR • Non-electronic GO annotations + embryo-lethal gene (SeedGenes) 40% of the modules could be linked to a significant functional enrichment (GO BP - embryo lethality) 98% of the modules have 1 (or more) gene(s) with a known experimental annotation
  • 37. FUNCTIONAL MODULE REPERTOIRE http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  • 38. CROSS-SPECIES MODULE ANALYSIS Affymetrix GeneChip NCBI Gene Expression Omnibus1563 Arabidopsis Integrative modules orthology
  • 39. CONSERVED MODULE EXPRESSION COHERENCE Lipid biosynthesis 58% of modules shows significant coexpression coherence (3 or more species) >43,000 unknown genes from 6 other plants receive module-based functional annotations
  • 40. MODULE-BASED FUNCTION PREDICTIONS Can we recover new experimental Arabidopsis gene – GO BP annotations? Data freeze Evaluation 1460 Arabidopsis genes with predictions receive new exp. GO-BP Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # ConfAll Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
  • 41. DNA ENDOREDUPLICATION • PPI module: predicted to be involved in DNA endoreduplication • Experimental validation shows that AT1G06590 T-DNA shows perturbed endoreduplication index (Quimbaya et al., 2012) Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  • 42. 4. INTEGRATED CO-EXPRESSION – ORTHOLOGY NETWORKSMovahedi et al., 2012
  • 43. 3-WAY SPECIES CO-EXPRESSION COMPARISON FOR ETG1 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Role in sister chromatin cohesion Movahedi et al., 2012; Takahashi et al., 2010
  • 44. SORTING OUT PLANT (CO-)ORTHOLOGS USINGEXPRESSION CONTEXT CONSERVATION Protein integrative orthology Expression Context Conservation scores (p-value < 0.05) Inparalogs (species-specific duplicates)
  • 45. 4. CONCLUSIONS Need for advanced & user-friendly tools to characterize new genomes  Complexity and quality genome sequences  Scalability with increasing number of genomes Integrative approaches combining multiple methods outperform individual methods* and provide users a more complete view  Computer power  Visualization Large discrepancy in the functional gene associations between the different experimental data sets A large fraction of the module-based functional predictions are biologically valid and can be transferred across species Comparative network approaches provide a powerful tool to integrate functional genomics data * Quest for Orthologs Consortium, Bioinformatics 2012
  • 46. ACKNOWLEDGEMENTS Ken Heyndrickx Michiel Van Bel Sebastian Proost Sara Movahedi Mauricio Quimbaya
  • 47. ACKNOWLEDGEMENTSFurther reading Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol. Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co- expression analysis in plant biology. Plant, Cell & Environment