Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Inferring gene functions and regulatory
interactions in plants using different
flavors of comparative genomics
Vienna, Feb...
Overview
1. Comparative co-expression analysis
 Principles & workflow
 Applications
2. Inference of transcriptonal netwo...
1. Comparative co-expression analysis
Movahedi, … & Vandepoele (2012) Plant, Cell & Environment
3-species co-expression analysis ETG1
*Takahashi et al., PloS Genetics 2010
• Conserved DNA
replication module
• Conserved...
Function prediction using co-expression analysis
 Recovery experimental functions using cross-species co-
expression
1
10...
Biological biases in gene function prediction
Construction of Arabidopsis Functional Gene Modules
Heyndrickx and Vandepoele, Plant Phys. 2012
Properties and overlap functional gene modules
Primary Data Modules
Datatype # Genes # Associations (%
unique)
# Genes # M...
Module-based Gene function inference
• PPI module: predicted to be
involved in DNA endoreduplication
• AT1G06590 recently
...
Module-based recovery of new experimental gene
functions
Data freeze Evaluation
Unknown Unknown Exp. BP Other Exp. BP Tota...
Sorting out functionaly conserved plant orthologs
Expression Context Conservation
scores (p-value < 0.05)
Protein integrat...
2. Transcriptional Gene Regulatory Networks
Mejia-Guerra et al., 2012
Arabidopsis
-1,700-2,500 Transcription Factors
- 180...
Phylogenetic footprinting: detection of Conserved
Non-coding Sequences (CNS)
 Comparative analysis of noncoding DNA seque...
Developing an ensemble framework for phylogenetic
footprinting in plants
 Application of motif mapping and
different pair...
From pairwise alignments to multi-species footprints
 Generate all pairwise alignments
between Arabidopsis query gene
and...
Evaluation AtProbe experimental cis-regulatory
elements
Significance
Experimental
motifs
Scmm ACGTGGC = 0.54
P value < 0.0...
Properties CNS
 69,361 CNSs associated with 17,895 genes
 Protein-coding genes (99%), miRNA genes (1%)
 Median length: ...
CNS identify in vivo functional targets
 Integrative known binding site data for 199 TFs
 Translate CNS into 40,758 TF-t...
Genome-wide regulatory annotation
Collapsed TF-target module network
 3,085 TF-target interactions
 9/13 TFs significant...
Conclusions
 Co-expression analysis offers a pragmatic means to
 explore new gene functions in plants
 sort out many-to...
Further reading
Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative
co-expression analysis in pl...
Upcoming SlideShare
Loading in …5
×

Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics

940 views

Published on

Slides from invited talk VISCEA conference Plant Gene Discovery & “Omics” Technologies. February 17-18, 2014, Vienna, Austria.

Reference CNS: http://www.plantcell.org/content/early/2014/07/02/tpc.114.127001.abstract

Published in: Science
  • Be the first to comment

  • Be the first to like this

Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics

  1. 1. Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics Vienna, February 2014 Comparative & Integrative Genomics group Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics Klaas Vandepoele
  2. 2. Overview 1. Comparative co-expression analysis  Principles & workflow  Applications 2. Inference of transcriptonal networks using an ensemble framework for phylogenetic footprinting in plants  Methodology  Experimental evaluation  Genome-wide regulatory annotation 3. Conclusions & perspectives Jan Van de Velde Sara Movahedi Ken Heyndrickx
  3. 3. 1. Comparative co-expression analysis Movahedi, … & Vandepoele (2012) Plant, Cell & Environment
  4. 4. 3-species co-expression analysis ETG1 *Takahashi et al., PloS Genetics 2010 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Conserved role in sister chromatin cohesion*
  5. 5. Function prediction using co-expression analysis  Recovery experimental functions using cross-species co- expression 1 10 100 1,000 10,000 100,000 1,000,000 cross-species Arabidopsis-only 0.00 0.05 0.10 0.15 0.20 0.25 Clustering  Top 300 co-expression genes based on Pearson product-moment correlation coefficient  Cluster Affinity Search Technique (CAST) pruning  Size correction between cross-species and Arabidopsis gene co-expresion clusters  Functional enrichment analysis Benchmark  1402 Gene Ontology Biologiccal Process (10-1000 genes); n=119,467 non-electronic gene-GO annotations
  6. 6. Biological biases in gene function prediction
  7. 7. Construction of Arabidopsis Functional Gene Modules Heyndrickx and Vandepoele, Plant Phys. 2012
  8. 8. Properties and overlap functional gene modules Primary Data Modules Datatype # Genes # Associations (% unique) # Genes # Modules (% unique) Functional Enrichment Motif Enrichment PPI 3,194 7,210 (75%) 597 72 (95%) 51 43 AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172 TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224 GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341 Total 22,492 1,089,661 13,428 2,114 1,161 Non-redundant Modules 13,142 1,563 676 772 >99% modules found through a single input data type http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  9. 9. Module-based Gene function inference • PPI module: predicted to be involved in DNA endoreduplication • AT1G06590 recently experimentally validated by Quimbaya et al., 2012 Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  10. 10. Module-based recovery of new experimental gene functions Data freeze Evaluation Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%) 1460 Arabidopsis genes with predictions receive new exp. GO-BP Heyndrickx and Vandepoele, Plant Phys. 2012
  11. 11. Sorting out functionaly conserved plant orthologs Expression Context Conservation scores (p-value < 0.05) Protein integrative orthology Inparalogs (species-specific duplicates) Movahedi, … & Vandepoele, Plant, Cell & Environment 2012 ubiquitin-activating enzyme (E1)
  12. 12. 2. Transcriptional Gene Regulatory Networks Mejia-Guerra et al., 2012 Arabidopsis -1,700-2,500 Transcription Factors - 180-791 miRNA - 2,708 expressed lncRNA 49MB non-coding DNA 11,000 regulatory interactions (AtRegNet)
  13. 13. Phylogenetic footprinting: detection of Conserved Non-coding Sequences (CNS)  Comparative analysis of noncoding DNA sequences to identify candidate regulatory elements (in orthologous genes)  Regulatory elements are conserved during evolution due to functional constraint (vs. neutral carry-over)  The power of phylogenetic footprinting is enhanced significantly when data from a number of related species, which diverged sufficiently, is available
  14. 14. Developing an ensemble framework for phylogenetic footprinting in plants  Application of motif mapping and different pairwise alignment tools  Aggregate alignments in multi- species footprint using 11 comparator dicot genomes  Evaluate statistical signifcance incl. FDR analysis AtProbe Feature map @ RSAT 144 regulatory elements (63 genes) 774 DNA motifs
  15. 15. From pairwise alignments to multi-species footprints  Generate all pairwise alignments between Arabidopsis query gene and its orthologs  Map all pairwise alignments back to reference promoter  Count per position the #species that support a footprint
  16. 16. Evaluation AtProbe experimental cis-regulatory elements Significance Experimental motifs Scmm ACGTGGC = 0.54 P value < 0.001 G-box Scmm ATAGATAA = 0.09 P value 0.48 GA motif Scmm GATAAGATT = 0.36 P value < 0.001 I-box RBCS1A Scmm TATATATA = 0.7 P value < 0.001 GAPA ACA motif C-motif
  17. 17. Properties CNS  69,361 CNSs associated with 17,895 genes  Protein-coding genes (99%), miRNA genes (1%)  Median length: 11nt (min-max: 5-514nt)  CNS cover 1,070kb of the non-coding Arabidopsis genome
  18. 18. CNS identify in vivo functional targets  Integrative known binding site data for 199 TFs  Translate CNS into 40,758 TF-target interactions  Compare with experimental functional TF targets ChIP-Seq binding + TF binding site + differentially expressed during TF perturbation (n=2708)
  19. 19. Genome-wide regulatory annotation Collapsed TF-target module network  3,085 TF-target interactions  9/13 TFs significant overlap with experimentally confirmed targets (AtRegNet)
  20. 20. Conclusions  Co-expression analysis offers a pragmatic means to  explore new gene functions in plants  sort out many-to-many orthologs across species  Integration of complementary experimental data sources enlarges our view on functional gene modules  Enhanced multi-species phylogenetic footprinting method to identify genome-wide conserved non-coding sequences (CNS)  Integration of CNS with complementary experimental data sources offers new possibilities for regulatory gene annotation in plants
  21. 21. Further reading Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co-expression analysis in plant biology. Plant Cell Environ 35, 1787-1798 Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol 159, 884-901. Proost, S.*, Van Bel, M.*, Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21: 3718-3731. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol 14, R134.

×