Inferring gene functions and regulatory interactions in plants using comparative genomics
1. Inferring gene functions and regulatory
interactions in plants using different
flavors of comparative genomics
Vienna, February 2014
Comparative & Integrative Genomics group
Department of Plant Biotechnology and Bioinformatics, Ghent University
Department of Plant Systems Biology, VIB - Belgium
http://twitter.com/plaza_genomics
Klaas Vandepoele
2. Overview
1. Comparative co-expression analysis
Principles & workflow
Applications
2. Inference of transcriptonal networks using an ensemble
framework for phylogenetic footprinting in plants
Methodology
Experimental evaluation
Genome-wide regulatory annotation
3. Conclusions & perspectives
Jan Van de Velde Sara Movahedi
Ken Heyndrickx
8. Properties and overlap functional gene modules
Primary Data Modules
Datatype # Genes # Associations (%
unique)
# Genes # Modules
(% unique)
Functional
Enrichment
Motif
Enrichment
PPI 3,194 7,210 (75%) 597 72 (95%) 51 43
AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172
TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224
GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341
Total 22,492 1,089,661 13,428 2,114 1,161
Non-redundant
Modules
13,142 1,563 676 772
>99% modules found through a single input data type
http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
9. Module-based Gene function inference
• PPI module: predicted to be
involved in DNA endoreduplication
• AT1G06590 recently
experimentally validated by
Quimbaya et al., 2012
Plant Mutants Flow Cytometry
Quimbaya, Vandepoele,… De Veylder, 2012
10. Module-based recovery of new experimental gene
functions
Data freeze Evaluation
Unknown Unknown Exp. BP Other Exp. BP Total
#Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf
All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%)
Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%)
Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
1460 Arabidopsis genes with predictions receive new exp. GO-BP
Heyndrickx and Vandepoele, Plant Phys. 2012
13. Phylogenetic footprinting: detection of Conserved
Non-coding Sequences (CNS)
Comparative analysis of noncoding DNA sequences to identify
candidate regulatory elements (in orthologous genes)
Regulatory elements are conserved during evolution due to
functional constraint (vs. neutral carry-over)
The power of phylogenetic footprinting is enhanced significantly
when data from a number of related species, which diverged
sufficiently, is available
14. Developing an ensemble framework for phylogenetic
footprinting in plants
Application of motif mapping and
different pairwise alignment tools
Aggregate alignments in multi-
species footprint using 11
comparator dicot genomes
Evaluate statistical signifcance incl.
FDR analysis
AtProbe
Feature map
@ RSAT
144 regulatory elements (63 genes)
774 DNA motifs
15. From pairwise alignments to multi-species footprints
Generate all pairwise alignments
between Arabidopsis query gene
and its orthologs
Map all pairwise alignments back
to reference promoter
Count per position the #species
that support a footprint
16. Evaluation AtProbe experimental cis-regulatory
elements
Significance
Experimental
motifs
Scmm ACGTGGC = 0.54
P value < 0.001
G-box
Scmm ATAGATAA = 0.09
P value 0.48
GA motif
Scmm GATAAGATT = 0.36
P value < 0.001
I-box
RBCS1A
Scmm TATATATA = 0.7
P value < 0.001
GAPA
ACA motif
C-motif
17. Properties CNS
69,361 CNSs associated with 17,895 genes
Protein-coding genes (99%), miRNA genes (1%)
Median length: 11nt (min-max: 5-514nt)
CNS cover 1,070kb of the non-coding Arabidopsis genome
18. CNS identify in vivo functional targets
Integrative known binding site data for 199 TFs
Translate CNS into 40,758 TF-target interactions
Compare with experimental functional TF targets
ChIP-Seq binding + TF binding site + differentially
expressed during TF perturbation (n=2708)
20. Conclusions
Co-expression analysis offers a pragmatic means to
explore new gene functions in plants
sort out many-to-many orthologs across species
Integration of complementary experimental data sources enlarges
our view on functional gene modules
Enhanced multi-species phylogenetic footprinting method to identify
genome-wide conserved non-coding sequences (CNS)
Integration of CNS with complementary experimental data sources
offers new possibilities for regulatory gene annotation in plants
21. Further reading
Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative
co-expression analysis in plant biology. Plant Cell Environ 35, 1787-1798
Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of
functional plant modules through the integration of complementary data sources.
Plant Physiol 159, 884-901.
Proost, S.*, Van Bel, M.*, Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and
Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene
and genome evolution in plants. Plant Cell 21: 3718-3731.
Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and
Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and
comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol 14, R134.