SlideShare a Scribd company logo
1 of 21
Inferring gene functions and regulatory
interactions in plants using different
flavors of comparative genomics
Vienna, February 2014
Comparative & Integrative Genomics group
Department of Plant Biotechnology and Bioinformatics, Ghent University
Department of Plant Systems Biology, VIB - Belgium
http://twitter.com/plaza_genomics
Klaas Vandepoele
Overview
1. Comparative co-expression analysis
 Principles & workflow
 Applications
2. Inference of transcriptonal networks using an ensemble
framework for phylogenetic footprinting in plants
 Methodology
 Experimental evaluation
 Genome-wide regulatory annotation
3. Conclusions & perspectives
Jan Van de Velde Sara Movahedi
Ken Heyndrickx
1. Comparative co-expression analysis
Movahedi, … & Vandepoele (2012) Plant, Cell & Environment
3-species co-expression analysis ETG1
*Takahashi et al., PloS Genetics 2010
• Conserved DNA
replication module
• Conserved E2F target
gene (TTTCCGC)
• Conserved role in
sister chromatin
cohesion*
Function prediction using co-expression analysis
 Recovery experimental functions using cross-species co-
expression
1
10
100
1,000
10,000
100,000
1,000,000
cross-species
Arabidopsis-only
0.00
0.05
0.10
0.15
0.20
0.25
Clustering
 Top 300 co-expression genes based on Pearson product-moment correlation coefficient
 Cluster Affinity Search Technique (CAST) pruning
 Size correction between cross-species and Arabidopsis gene co-expresion clusters
 Functional enrichment analysis
Benchmark
 1402 Gene Ontology Biologiccal Process (10-1000 genes); n=119,467 non-electronic gene-GO
annotations
Biological biases in gene function prediction
Construction of Arabidopsis Functional Gene Modules
Heyndrickx and Vandepoele, Plant Phys. 2012
Properties and overlap functional gene modules
Primary Data Modules
Datatype # Genes # Associations (%
unique)
# Genes # Modules
(% unique)
Functional
Enrichment
Motif
Enrichment
PPI 3,194 7,210 (75%) 597 72 (95%) 51 43
AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172
TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224
GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341
Total 22,492 1,089,661 13,428 2,114 1,161
Non-redundant
Modules
13,142 1,563 676 772
>99% modules found through a single input data type
http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
Module-based Gene function inference
• PPI module: predicted to be
involved in DNA endoreduplication
• AT1G06590 recently
experimentally validated by
Quimbaya et al., 2012
Plant Mutants Flow Cytometry
Quimbaya, Vandepoele,… De Veylder, 2012
Module-based recovery of new experimental gene
functions
Data freeze Evaluation
Unknown Unknown Exp. BP Other Exp. BP Total
#Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf
All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%)
Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%)
Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
1460 Arabidopsis genes with predictions receive new exp. GO-BP
Heyndrickx and Vandepoele, Plant Phys. 2012
Sorting out functionaly conserved plant orthologs
Expression Context Conservation
scores (p-value < 0.05)
Protein integrative orthology
Inparalogs (species-specific
duplicates)
Movahedi, … & Vandepoele, Plant, Cell & Environment 2012
ubiquitin-activating enzyme (E1)
2. Transcriptional Gene Regulatory Networks
Mejia-Guerra et al., 2012
Arabidopsis
-1,700-2,500 Transcription Factors
- 180-791 miRNA
- 2,708 expressed lncRNA
49MB non-coding DNA
11,000 regulatory
interactions (AtRegNet)
Phylogenetic footprinting: detection of Conserved
Non-coding Sequences (CNS)
 Comparative analysis of noncoding DNA sequences to identify
candidate regulatory elements (in orthologous genes)
 Regulatory elements are conserved during evolution due to
functional constraint (vs. neutral carry-over)
 The power of phylogenetic footprinting is enhanced significantly
when data from a number of related species, which diverged
sufficiently, is available
Developing an ensemble framework for phylogenetic
footprinting in plants
 Application of motif mapping and
different pairwise alignment tools
 Aggregate alignments in multi-
species footprint using 11
comparator dicot genomes
 Evaluate statistical signifcance incl.
FDR analysis
AtProbe
Feature map
@ RSAT
144 regulatory elements (63 genes)
774 DNA motifs
From pairwise alignments to multi-species footprints
 Generate all pairwise alignments
between Arabidopsis query gene
and its orthologs
 Map all pairwise alignments back
to reference promoter
 Count per position the #species
that support a footprint
Evaluation AtProbe experimental cis-regulatory
elements
Significance
Experimental
motifs
Scmm ACGTGGC = 0.54
P value < 0.001
G-box
Scmm ATAGATAA = 0.09
P value 0.48
GA motif
Scmm GATAAGATT = 0.36
P value < 0.001
I-box
RBCS1A
Scmm TATATATA = 0.7
P value < 0.001
GAPA
ACA motif
C-motif
Properties CNS
 69,361 CNSs associated with 17,895 genes
 Protein-coding genes (99%), miRNA genes (1%)
 Median length: 11nt (min-max: 5-514nt)
 CNS cover 1,070kb of the non-coding Arabidopsis genome
CNS identify in vivo functional targets
 Integrative known binding site data for 199 TFs
 Translate CNS into 40,758 TF-target interactions
 Compare with experimental functional TF targets
ChIP-Seq binding + TF binding site + differentially
expressed during TF perturbation (n=2708)
Genome-wide regulatory annotation
Collapsed TF-target module network
 3,085 TF-target interactions
 9/13 TFs significant overlap with
experimentally confirmed targets
(AtRegNet)
Conclusions
 Co-expression analysis offers a pragmatic means to
 explore new gene functions in plants
 sort out many-to-many orthologs across species
 Integration of complementary experimental data sources enlarges
our view on functional gene modules
 Enhanced multi-species phylogenetic footprinting method to identify
genome-wide conserved non-coding sequences (CNS)
 Integration of CNS with complementary experimental data sources
offers new possibilities for regulatory gene annotation in plants
Further reading
Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative
co-expression analysis in plant biology. Plant Cell Environ 35, 1787-1798
Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of
functional plant modules through the integration of complementary data sources.
Plant Physiol 159, 884-901.
Proost, S.*, Van Bel, M.*, Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and
Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene
and genome evolution in plants. Plant Cell 21: 3718-3731.
Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and
Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and
comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol 14, R134.

More Related Content

Similar to Inferring gene functions and regulatory interactions in plants using comparative genomics

A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...Klaas Vandepoele
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Klaas Vandepoele
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsGenomeInABottle
 
DextMP: Text mining for finding moonlighting proteins
DextMP: Text mining for finding moonlighting proteinsDextMP: Text mining for finding moonlighting proteins
DextMP: Text mining for finding moonlighting proteinsPurdue University
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKeywan Hassani-Pak
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Enrico Glaab
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Klaas Vandepoele
 
Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsN Poorin
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Dr. Mukesh Chavan
 
A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...Thermo Fisher Scientific
 
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila CellsGenome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cellslleung
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
Deep learning for extracting protein-protein interactions from biomedical lit...
Deep learning for extracting protein-protein interactions from biomedical lit...Deep learning for extracting protein-protein interactions from biomedical lit...
Deep learning for extracting protein-protein interactions from biomedical lit...Yifan Peng
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014LushPrize
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtaiSirris
 
21 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-21721 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-217Alexander Decker
 

Similar to Inferring gene functions and regulatory interactions in plants using comparative genomics (20)

A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...Integrative inference of transcriptional networks in Arabidopsis yields novel...
Integrative inference of transcriptional networks in Arabidopsis yields novel...
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
DextMP: Text mining for finding moonlighting proteins
DextMP: Text mining for finding moonlighting proteinsDextMP: Text mining for finding moonlighting proteins
DextMP: Text mining for finding moonlighting proteins
 
KnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network MinerKnetMiner - Knowledge Network Miner
KnetMiner - Knowledge Network Miner
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)
 
Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?Comparative genomics to the rescue: How complete is your plant genome sequence?
Comparative genomics to the rescue: How complete is your plant genome sequence?
 
Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plants
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...
 
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila CellsGenome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
Genome-Wide RNAi Analysis of Growth and Viability in Drosophila Cells
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
Deep learning for extracting protein-protein interactions from biomedical lit...
Deep learning for extracting protein-protein interactions from biomedical lit...Deep learning for extracting protein-protein interactions from biomedical lit...
Deep learning for extracting protein-protein interactions from biomedical lit...
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
21 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-21721 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-217
 

Recently uploaded

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 

Inferring gene functions and regulatory interactions in plants using comparative genomics

  • 1. Inferring gene functions and regulatory interactions in plants using different flavors of comparative genomics Vienna, February 2014 Comparative & Integrative Genomics group Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics Klaas Vandepoele
  • 2. Overview 1. Comparative co-expression analysis  Principles & workflow  Applications 2. Inference of transcriptonal networks using an ensemble framework for phylogenetic footprinting in plants  Methodology  Experimental evaluation  Genome-wide regulatory annotation 3. Conclusions & perspectives Jan Van de Velde Sara Movahedi Ken Heyndrickx
  • 3. 1. Comparative co-expression analysis Movahedi, … & Vandepoele (2012) Plant, Cell & Environment
  • 4. 3-species co-expression analysis ETG1 *Takahashi et al., PloS Genetics 2010 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Conserved role in sister chromatin cohesion*
  • 5. Function prediction using co-expression analysis  Recovery experimental functions using cross-species co- expression 1 10 100 1,000 10,000 100,000 1,000,000 cross-species Arabidopsis-only 0.00 0.05 0.10 0.15 0.20 0.25 Clustering  Top 300 co-expression genes based on Pearson product-moment correlation coefficient  Cluster Affinity Search Technique (CAST) pruning  Size correction between cross-species and Arabidopsis gene co-expresion clusters  Functional enrichment analysis Benchmark  1402 Gene Ontology Biologiccal Process (10-1000 genes); n=119,467 non-electronic gene-GO annotations
  • 6. Biological biases in gene function prediction
  • 7. Construction of Arabidopsis Functional Gene Modules Heyndrickx and Vandepoele, Plant Phys. 2012
  • 8. Properties and overlap functional gene modules Primary Data Modules Datatype # Genes # Associations (% unique) # Genes # Modules (% unique) Functional Enrichment Motif Enrichment PPI 3,194 7,210 (75%) 597 72 (95%) 51 43 AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172 TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224 GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341 Total 22,492 1,089,661 13,428 2,114 1,161 Non-redundant Modules 13,142 1,563 676 772 >99% modules found through a single input data type http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  • 9. Module-based Gene function inference • PPI module: predicted to be involved in DNA endoreduplication • AT1G06590 recently experimentally validated by Quimbaya et al., 2012 Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  • 10. Module-based recovery of new experimental gene functions Data freeze Evaluation Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%) 1460 Arabidopsis genes with predictions receive new exp. GO-BP Heyndrickx and Vandepoele, Plant Phys. 2012
  • 11. Sorting out functionaly conserved plant orthologs Expression Context Conservation scores (p-value < 0.05) Protein integrative orthology Inparalogs (species-specific duplicates) Movahedi, … & Vandepoele, Plant, Cell & Environment 2012 ubiquitin-activating enzyme (E1)
  • 12. 2. Transcriptional Gene Regulatory Networks Mejia-Guerra et al., 2012 Arabidopsis -1,700-2,500 Transcription Factors - 180-791 miRNA - 2,708 expressed lncRNA 49MB non-coding DNA 11,000 regulatory interactions (AtRegNet)
  • 13. Phylogenetic footprinting: detection of Conserved Non-coding Sequences (CNS)  Comparative analysis of noncoding DNA sequences to identify candidate regulatory elements (in orthologous genes)  Regulatory elements are conserved during evolution due to functional constraint (vs. neutral carry-over)  The power of phylogenetic footprinting is enhanced significantly when data from a number of related species, which diverged sufficiently, is available
  • 14. Developing an ensemble framework for phylogenetic footprinting in plants  Application of motif mapping and different pairwise alignment tools  Aggregate alignments in multi- species footprint using 11 comparator dicot genomes  Evaluate statistical signifcance incl. FDR analysis AtProbe Feature map @ RSAT 144 regulatory elements (63 genes) 774 DNA motifs
  • 15. From pairwise alignments to multi-species footprints  Generate all pairwise alignments between Arabidopsis query gene and its orthologs  Map all pairwise alignments back to reference promoter  Count per position the #species that support a footprint
  • 16. Evaluation AtProbe experimental cis-regulatory elements Significance Experimental motifs Scmm ACGTGGC = 0.54 P value < 0.001 G-box Scmm ATAGATAA = 0.09 P value 0.48 GA motif Scmm GATAAGATT = 0.36 P value < 0.001 I-box RBCS1A Scmm TATATATA = 0.7 P value < 0.001 GAPA ACA motif C-motif
  • 17. Properties CNS  69,361 CNSs associated with 17,895 genes  Protein-coding genes (99%), miRNA genes (1%)  Median length: 11nt (min-max: 5-514nt)  CNS cover 1,070kb of the non-coding Arabidopsis genome
  • 18. CNS identify in vivo functional targets  Integrative known binding site data for 199 TFs  Translate CNS into 40,758 TF-target interactions  Compare with experimental functional TF targets ChIP-Seq binding + TF binding site + differentially expressed during TF perturbation (n=2708)
  • 19. Genome-wide regulatory annotation Collapsed TF-target module network  3,085 TF-target interactions  9/13 TFs significant overlap with experimentally confirmed targets (AtRegNet)
  • 20. Conclusions  Co-expression analysis offers a pragmatic means to  explore new gene functions in plants  sort out many-to-many orthologs across species  Integration of complementary experimental data sources enlarges our view on functional gene modules  Enhanced multi-species phylogenetic footprinting method to identify genome-wide conserved non-coding sequences (CNS)  Integration of CNS with complementary experimental data sources offers new possibilities for regulatory gene annotation in plants
  • 21. Further reading Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co-expression analysis in plant biology. Plant Cell Environ 35, 1787-1798 Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol 159, 884-901. Proost, S.*, Van Bel, M.*, Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21: 3718-3731. Van Bel, M., Proost, S., Van Neste, C., Deforce, D., Van de Peer, Y., and Vandepoele, K. (2013). TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes. Genome Biol 14, R134.