Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics

  • 266 views
Uploaded on

Slides from invited talk VISCEA conference Plant Gene Discovery & “Omics” Technologies. February 17-18, 2014, Vienna, Austria. …

Slides from invited talk VISCEA conference Plant Gene Discovery & “Omics” Technologies. February 17-18, 2014, Vienna, Austria.

Reference CNS: http://www.plantcell.org/content/early/2014/07/02/tpc.114.127001.abstract

More in: Science
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
266
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics Vienna, February 2014 Comparative & Integrative Genomics group Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics Klaas Vandepoele
  • 2. Overview 1. Comparative co-expression analysis  Principles & workflow  Applications 2. Inference of transcriptonal networks using an ensemble framework for phylogenetic footprinting in plants  Methodology  Experimental evaluation  Genome-wide regulatory annotation 3. Conclusions & perspectives Jan Van de Velde Sara Movahedi Ken Heyndrickx
  • 3. 1. Comparative co-expression analysis Movahedi, … & Vandepoele (2012) Plant, Cell & Environment
  • 4. 3-species co-expression analysis ETG1 *Takahashi et al., PloS Genetics 2010 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Conserved role in sister chromatin cohesion*
  • 5. Function prediction using co-expression analysis  Recovery experimental functions using cross-species co- expression 1 10 100 1,000 10,000 100,000 1,000,000 cross-species Arabidopsis-only 0.00 0.05 0.10 0.15 0.20 0.25 Clustering  Top 300 co-expression genes based on Pearson product-moment correlation coefficient  Cluster Affinity Search Technique (CAST) pruning  Size correction between cross-species and Arabidopsis gene co-expresion clusters  Functional enrichment analysis Benchmark  1402 Gene Ontology Biologiccal Process (10-1000 genes); n=119,467 non-electronic gene-GO annotations
  • 6. Biological biases in gene function prediction
  • 7. Construction of Arabidopsis Functional Gene Modules Heyndrickx and Vandepoele, Plant Phys. 2012
  • 8. Properties and overlap functional gene modules Primary Data Modules Datatype # Genes # Associations (% unique) # Genes # Modules (% unique) Functional Enrichment Motif Enrichment PPI 3,194 7,210 (75%) 597 72 (95%) 51 43 AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172 TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224 GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341 Total 22,492 1,089,661 13,428 2,114 1,161 Non-redundant Modules 13,142 1,563 676 772 >99% modules found through a single input data type http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  • 9. Module-based Gene function inference • PPI module: predicted to be involved in DNA endoreduplication • AT1G06590 recently experimentally validated by Quimbaya et al., 2012 Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  • 10. Module-based recovery of new experimental gene functions Data freeze Evaluation Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%) 1460 Arabidopsis genes with predictions receive new exp. GO-BP Heyndrickx and Vandepoele, Plant Phys. 2012
  • 11. Sorting out functionaly conserved plant orthologs Expression Context Conservation scores (p-value < 0.05) Protein integrative orthology Inparalogs (species-specific duplicates) Movahedi, … & Vandepoele, Plant, Cell & Environment 2012 ubiquitin-activating enzyme (E1)
  • 12. 2. Transcriptional Gene Regulatory Networks Mejia-Guerra et al., 2012 Arabidopsis -1,700-2,500 Transcription Factors - 180-791 miRNA - 2,708 expressed lncRNA 49MB non-coding DNA 11,000 regulatory interactions (AtRegNet)
  • 13. Phylogenetic footprinting: detection of Conserved Non-coding Sequences (CNS)  Comparative analysis of noncoding DNA sequences to identify candidate regulatory elements (in orthologous genes)  Regulatory elements are conserved during evolution due to functional constraint (vs. neutral carry-over)  The power of phylogenetic footprinting is enhanced significantly when data from a number of related species, which diverged sufficiently, is available
  • 14. Developing an ensemble framework for phylogenetic footprinting in plants  Application of motif mapping and different pairwise alignment tools  Aggregate alignments in multi- species footprint using 11 comparator dicot genomes  Evaluate statistical signifcance incl. FDR analysis AtProbe Feature map @ RSAT 144 regulatory elements (63 genes) 774 DNA motifs
  • 15. From pairwise alignments to multi-species footprints  Generate all pairwise alignments between Arabidopsis query gene and its orthologs  Map all pairwise alignments back to reference promoter  Count per position the #species that support a footprint
  • 16. Evaluation AtProbe experimental cis-regulatory elements Significance Experimental motifs Scmm ACGTGGC = 0.54 P value < 0.001 G-box Scmm ATAGATAA = 0.09 P value 0.48 GA motif Scmm GATAAGATT = 0.36 P value < 0.001 I-box RBCS1A Scmm TATATATA = 0.7 P value < 0.001 GAPA ACA motif C-motif
  • 17. Properties CNS  69,361 CNSs associated with 17,895 genes  Protein-coding genes (99%), miRNA genes (1%)  Median length: 11nt (min-max: 5-514nt)  CNS cover 1,070kb of the non-coding Arabidopsis genome
  • 18. CNS identify in vivo functional targets  Integrative known binding site data for 199 TFs  Translate CNS into 40,758 TF-target interactions  Compare with experimental functional TF targets ChIP-Seq binding + TF binding site + differentially expressed during TF perturbation (n=2708)
  • 19. Genome-wide regulatory annotation Collapsed TF-target module network  3,085 TF-target interactions  9/13 TFs significant overlap with experimentally confirmed targets (AtRegNet)
  • 20. Conclusions  Co-expression analysis offers a pragmatic means to  explore new gene functions in plants  sort out many-to-many orthologs across species  Integration of complementary experimental data sources enlarges our view on functional gene modules  Enhanced multi-species phylogenetic footprinting method to identify genome-wide conserved non-coding sequences (CNS)  Integration of CNS with complementary experimental data sources offers new possibilities for regulatory gene annotation in plants
  • 21. Further reading Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co-expression analysis in plant biology. Plant Cell Environ 35, 1787-1798 Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol 159, 884-901. Proost, S.*, Van Bel, M.*, Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21: 3718-3731. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol 14, R134.