SlideShare a Scribd company logo
Unravelling transcription factor functions
through integrative inference of transcriptional
networks in Arabidopsis
KlaasVandepoele
Department of Plant Biotechnology and Bioinformatics, UGent
VIB Center for Plant Systems Biology
Utrecht
Artificial intelligence in plant science and breeding
24 February 2021
plaza_genomics
Comparative Network Biology -Vandepoele lab
• Extract biological knowledge from large-scale experimental data sets using data
integration, comparative sequence & expression analysis, and network biology, to improve
our understanding of gene functions and regulation in plants and diatoms.
Plant Gene Regulatory Networks Comparative functional genomics
• PLAZA 4.0: an integrative resource for functional, evolutionary and
comparative plant genomics. Van Bel et al., 2018, Nucleic Acids Res
• Curse: building expression atlases and co-expression networks
from public RNA-Seq data. Vaneechoutte D, Vandepoele K.
Bioinformatics. 2019
• TF2Network: predicting transcription factor regulators and gene
regulatory networks in Arabidopsis using publicly available binding
site information. Kulkarni et al., 2018, Nucleic Acids Res
• Enhanced maps of transcription factor binding sites improve
regulatory networks learned from accessible chromatin data.
Kulkarni et al., Plant Physiol. 2019
Mapping of Gene Regulatory Networks (GRNs)
Mejia-Guerra et al., 2012
Arabidopsis
-1,700-2,500 Transcription Factors
- 180-791 miRNA
- 2,708 expressed lncRNA
49MB non-coding DNA
AtRegNet:
17,224 regulatory interactions
Experimental characterization of transcriptional activity and
regulatory control
ENCODE
How to integrate the biological knowledge captured by different –omics layers
to build better networks reporting functional regulatory interactions?
1.TF ChIP-Seq
• in vivo method to measure protein-DNA
interactions using chromatin immuno-
precipitation
• Different cellular conditions can be profiled
TF ChIP-Seq
Furey et al., 2012
ChIP
Gene
annotation
Reads
sample
Control
Peak
/
Motif
Read
pileup
Output ChIP-Seq peak calling procedure displayed in genome browser
Position Weight Matrix
(PWM)
TF
target gene
2. in vitroTF binding specificities
Wang et al., 2011
ModelTF binding site as
Position Weight Matrix (PWM)
based on k-mer signals
Protein binding
microarray
target gene
Arabidopsis: PWMs for 990TFs
PWM
3. DNase-seq - Profiling of accesible chromatin
Hesselberth et al, 2009 binding site TF protein
DH footprint
DNase I hypersensitive site
(DHS)
DHS+PWM
 Map all known PWMs on the
promoters of the Arabidopsis query
gene and its orthologs
 Count per PWM position the #species
that support a TF binding site
 Significance estimation (FDR<10%)
Van deVelde et al., Plant Phys 2016 -A Collection of Conserved Non-Coding Sequences to StudyGene Regulation in Flowering Plants.
Conserved PWM
4. Detection of conservedTF binding sites using
phylogenetic footprinting
5-6. Network inference based on expression data
TF
target
Expression-based network inference
GENIE3 - Huynh-Thu et al., 2013
GENIE3
COE TF-target
GENIE3
7. Co-expression + PWM enrichment
• Integrate co-regulatory gene expression data withTF binding sites (PWMs)
PWM enrichment in kNN co-expression cluster
(hypergeometric distribution)
COE+PWM
Benchmarking of different methods to map gene regulatory
networks
Gold standard: 5.7k interactions covering 522 TFs (AtRegNet)
Test set: 20% of gold standard (80% used for training)
Benchmarking of different methods to map gene regulatory
networks
JanVan deVelde
Gold standard: 5.7k interactions covering 522TFs (AtRegNet+literature)
Test set: 20% of gold standard (80% used for training later)
Supervised learning: a network-based approach for large-
scale functional data integration
Marbach et al., Genome Research 2012
Gradient Boosting Machine
• 1000 trees (shrinkage of 0.01, interaction depth 3, 10-fold CV training)
• 80% training data withTrue:False sampling ratio of 3:1
• 7 input networks
d Motifs of TFs ChIP Binding of TFs TF Motif Occurence in
Open Chromatin
TF-Gene Co-Expression
targets
1800 TFs
Supervised Network
Known Interactions
Training Data
Test Data
Cross-validation
Input Features
Classifier
Target gene
TF gene
COE TF-target GENIE3 COE+PWM
DHS+PWM Conserved PWM
PWM
Supervised Learning max F1: 1793k interactions – 1766TFs
Gold standard
Max F1 network -Test set
Recall: 46%
Precision: 71%
F1-measure: 57%
ChIP
Performance supervised learning network (iGRN)
Supervised Learning max F1
Different support of input networks for iGRN
Supervised Learning max F1
iGRN captures functionalTF – target gene interactions
-- overlap target genes not significant (p-value hypergeometric distribution > 0.05)
--
34/40TFs have significant overlap between predicted target genes and DE
genes afterTF perturbation
--
--
--
--
--
iGRN-based functional annotation ofTFs
Recovery of experimental Gene Ontology Biological
Process annotations for TFs with known function
TF
TF function
Target genes
• Recovery of known experimentally-
supported functions for >600 TFs
• Novel functional predictions for 268
unknown TFs
• Highly complementary with AraNet v2
iGRN
catalase
H2O2 H20
3-AT
H2O2
PSI
MV
photorespiration
CIII
AntA
O2
- H2O2
O2
- H2O2
H2O2
Retrograde
signaling
Defense response
PCD
Oxidative stress signaling
In house dataset of ROS marker genes
Willems et al., 2016, Plant Physiology
Van Breusegem lab - VIB
Prediction and evaluation of novel oxidative stressTFs
(“ROS-TFs”)
Target gene enrichment:
• ROS wheel
• GO-BP ‘response to oxidative stress’
NovelTF
functions
(e.g. oxidative stress
responses)
TF Rank ID Gene name q-val
enrichme
nt Phenotype
1AT5G63790 ANAC102,NAC102 1,84E-35 22,74Oxidative (our data)
2AT3G55980 ATSZF1,SZF1 5,10E-26 32,71Salt
3AT2G37430 ZAT11 2,81E-25 14,41Oxidative (paraquat, Ni)
4AT2G40140 ATSZF2,CZF1 6,41E-25 17,45Salt
5AT5G59820 AtZAT12,RHL41,ZAT12 8,70E-25 18,64Oxidative and abiotic
6AT5G24110 ATWRKY30,WRKY30 1,42E-24 6,82oxidative, salt (at early developmental stage)
7AT2G38470 ATWRKY33,WRKY33 5,03E-24 6,85Pathogen, salt, heat stress
8AT2G46400 ATWRKY46,WRKY46 9,78E-24 5,79Osmotic/salt
9AT4G17500 ATERF-1,ERF-1 2,57E-23 9,42Biotic …
10AT1G28370 ATERF11,ERF11 3,10E-23 6,75Osmotic, ET signaling
11AT4G18880 AT-HSFA4A,HSF21 5,09E-23 6,39oxidative, salt
12AT2G23320 AtWRKY15,WRKY15 7,46E-23 6,31Oxidative, salt
13AT5G04340 C2H2,CZF2,ZAT6 7,52E-23 14,09
Cadmium, salt, osmotic stress, P deficiency,
pathogen, drought, cold
14AT1G80840 ATWRKY40,WRKY40 8,93E-23 6,51Biotic, MRS
15AT3G23250 ATMYB15,ATY19 1,86E-22 5,71Drought, cold
16AT1G27730 STZ,ZAT10 4,90E-22 9,99Oxidative transcripts, abiotic, ….
17AT5G49520 ATWRKY48,WRKY48 5,36E-22 11,26Biotic
18AT5G13080 ATWRKY75,WRKY75 1,40E-21 5,81Phosphate starvation, biotic
19AT4G17230 SCL13 3,76E-21 16,60Phytochrome dependent light signaling
20AT5G59450 AT5G59450 5,14E-21 19,95cell division
21AT1G42990 ATBZIP60,BZIP60 1,36E-20 5,46ER, unfolded protein
22AT1G18570 AtMYB51,BW51A 1,84E-20 5,82glucosinolate biosynthesis
23AT4G23810 ATWRKY53,WRKY53 6,64E-20 7,56
leaf senescence and regulation of oxidative
stress genes
24AT1G66550 ATWRKY67,WRKY67 3,25E-19 8,88None; no lines available
25AT3G23220 ESE1 3,54E-19 7,50Salt, ET signaling
26AT5G47230 ATERF5,ATERF-5 3,60E-19 6,00Biotic
27AT3G54810 BME3,BME3-ZF,GATA8 2,01E-18 14,92Germination, salt/drought
28AT5G47220 ATERF2,ATERF-2,ERF2 4,16E-18 5,57Biotic
29AT2G40740 ATWRKY55,WRKY55 6,69E-18 5,70None
30AT4G36990 ATHSF4,AT-HSFB1,HSF4 1,94E-17 5,98
similar to heat shock factor, no known
phenotype
31AT2G30250 ATWRKY25,WRKY25 1,59E-16 4,81Biotic, salt
32AT4G31550 ATWRKY11,WRKY11 2,03E-16 5,33Biotic
33AT5G01380 GT3a 2,15E-16 4,93None
34AT5G22570 ATWRKY38,WRKY38 2,19E-16 6,01Biotic
35AT3G23240 ATERF1,ERF1 2,36E-15 4,16ET
36AT4G17490 ATERF6,ERF6,ERF-6-6 3,13E-15 5,26Oxidative
37AT4G22950 AGL19,GL19 3,37E-15 15,26flowering
38AT3G10500 ANAC053,NAC053,NTL4 3,63E-15 5,02Oxidative/ROS
39AT3G49530 ANAC062,NAC062,NTL6 1,76E-14 7,24ER, unfolded protein
40AT5G62020 AT-HSFB2A,HSF6,HSFB2A 5,87E-14 5,87
41AT1G22070 TGA3 6,60E-14 5,33biotic
42AT1G67970 AT-HSFA8,HSFA8 7,97E-14 11,40redox dependent nucleus translocation
…
Unknown or no
stress-related
function
Oxidative stress
function
Other (a/biotic)
stress function
Ranking based on ROS wheel target
genes enrichment (n=124 TFs)
Functional validation of the predicted ROS-TFs
Inge De Clercq
13/32 regulators were
validated for a function
in ROS responses by
phenotyping
Rank – TF - perturbation
Phenotypes for predicted ROS-TFs
iGRN identified novel ROSTFs from the GRAS, BES1 and GATA families
Expression patterns for novel ROS-TFs
Responsiveness to a wide range of
oxidative stress conditions?
• 14/17 known ROS TFs
• 6/13 novel ROS TFs
Many novel ROS TFs would not have been
predicted solely relying on differential
expression at the whole plant or organ
level!
Conclusions
JanVan deVelde Inge De Clercq
 Different regulatory –omics data types as well as advanced
computational integration methods contribute significantly to the
improved delineation of high-quality gene regulatory networks
 TF binding site-based as well as expression-based regulatory
networks offer a complementary view on functional gene
regulatory interactions
 Gene regulatory networks obtained by supervised learning are a
starting point for
 the systematic functional/regulatory annotation of all Arabidopsis
genes
 new biological discoveries
Li Liu, DriesVaneechoutte
Robin Pottie, Xiaopeng Liu, FrankVan Breusegem
Further reading
Curse: Building expression atlases and co-expression networks from public
RNA-Seq data. Vaneechoutte and Vandepoele (2019) Bioinformatics
TF2Network: predicting transcription factor regulators and gene regulatory
networks in Arabidopsis using publicly available binding site information
Kulkarni, Vaneechoutte, Van de Velde and Vandepoele (2018). Nucleic Acids
Research

More Related Content

What's hot

Anna university-ug-pg-ppt-presentation-format
Anna university-ug-pg-ppt-presentation-formatAnna university-ug-pg-ppt-presentation-format
Anna university-ug-pg-ppt-presentation-format
Veera Victory
 
How to create an IPR Strategy for startups and Basics of IPR
How to create an IPR Strategy for startups and Basics of IPRHow to create an IPR Strategy for startups and Basics of IPR
How to create an IPR Strategy for startups and Basics of IPR
Inolyst
 

What's hot (15)

F test and ANOVA
F test and ANOVAF test and ANOVA
F test and ANOVA
 
Delphi method
Delphi methodDelphi method
Delphi method
 
Intellectual Property Rights (IPR)
Intellectual Property Rights (IPR)Intellectual Property Rights (IPR)
Intellectual Property Rights (IPR)
 
Anova ppt
Anova pptAnova ppt
Anova ppt
 
Presentation about Delphi Method
Presentation about Delphi Method Presentation about Delphi Method
Presentation about Delphi Method
 
Assistive technology presentation
Assistive technology presentationAssistive technology presentation
Assistive technology presentation
 
PhD Viva PPT
PhD Viva PPTPhD Viva PPT
PhD Viva PPT
 
Intellectual Property rights
Intellectual Property rightsIntellectual Property rights
Intellectual Property rights
 
Anna university-ug-pg-ppt-presentation-format
Anna university-ug-pg-ppt-presentation-formatAnna university-ug-pg-ppt-presentation-format
Anna university-ug-pg-ppt-presentation-format
 
Data types
Data typesData types
Data types
 
Patent system of india
Patent system of indiaPatent system of india
Patent system of india
 
Presentation on Intellectual Property Rights
Presentation on Intellectual Property RightsPresentation on Intellectual Property Rights
Presentation on Intellectual Property Rights
 
How to create an IPR Strategy for startups and Basics of IPR
How to create an IPR Strategy for startups and Basics of IPRHow to create an IPR Strategy for startups and Basics of IPR
How to create an IPR Strategy for startups and Basics of IPR
 
Introduction to Intellectual Property
Introduction to Intellectual PropertyIntroduction to Intellectual Property
Introduction to Intellectual Property
 
Anova, ancova
Anova, ancovaAnova, ancova
Anova, ancova
 

Similar to Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators

Using the Ondex system for exploring Arabidopsis regulatory networks
Using the Ondex system for exploring Arabidopsis regulatory networksUsing the Ondex system for exploring Arabidopsis regulatory networks
Using the Ondex system for exploring Arabidopsis regulatory networks
Catherine Canevet
 
Bioinfomatics Presentation
Bioinfomatics PresentationBioinfomatics Presentation
Bioinfomatics Presentation
Zhenhong Bao
 
Metabolomics basics_MKKB1103_biotech for engineers
Metabolomics basics_MKKB1103_biotech for engineersMetabolomics basics_MKKB1103_biotech for engineers
Metabolomics basics_MKKB1103_biotech for engineers
ZanariahHashim1
 
Precision Oncology
Precision OncologyPrecision Oncology
Precision Oncology
Assay Engineers
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.
Robert Bruce
 

Similar to Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators (20)

TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...TF2Network: unravelling gene regulatory networks and transcription factor fun...
TF2Network: unravelling gene regulatory networks and transcription factor fun...
 
A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...A functional and evolutionary perspective on transcription factor binding in ...
A functional and evolutionary perspective on transcription factor binding in ...
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...Inferring gene functions and regulatory interactions in plants using differen...
Inferring gene functions and regulatory interactions in plants using differen...
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...
“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...
“PHA PRODUCTION FROM RECOMBINANT E.COLI EXPRESSING PHAC1 GENE TO PRODUCE POLY...
 
Using the Ondex system for exploring Arabidopsis regulatory networks
Using the Ondex system for exploring Arabidopsis regulatory networksUsing the Ondex system for exploring Arabidopsis regulatory networks
Using the Ondex system for exploring Arabidopsis regulatory networks
 
Molecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western CanadaMolecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western Canada
 
Bioinfomatics Presentation
Bioinfomatics PresentationBioinfomatics Presentation
Bioinfomatics Presentation
 
MDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricksMDC Connects: Cell-based screening: Old dogs with new tricks
MDC Connects: Cell-based screening: Old dogs with new tricks
 
In vitro and high throughput screening (HTS) assays
In vitro and high throughput screening (HTS) assaysIn vitro and high throughput screening (HTS) assays
In vitro and high throughput screening (HTS) assays
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
Metabolomics basics_MKKB1103_biotech for engineers
Metabolomics basics_MKKB1103_biotech for engineersMetabolomics basics_MKKB1103_biotech for engineers
Metabolomics basics_MKKB1103_biotech for engineers
 
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
 
Precision Oncology
Precision OncologyPrecision Oncology
Precision Oncology
 
Multiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell FunctionMultiplex Assays for Studying Gene Regulation and Cell Function
Multiplex Assays for Studying Gene Regulation and Cell Function
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.
 
A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...A next generation sequencing based sample-to-result pharmacogenomics research...
A next generation sequencing based sample-to-result pharmacogenomics research...
 

Recently uploaded

The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Seminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptxSeminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptx
RUDYLUMAPINET2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 

Recently uploaded (20)

The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Seminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptxSeminar on Halal AGriculture and Fisheries.pptx
Seminar on Halal AGriculture and Fisheries.pptx
 
Shuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptxShuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptx
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 

Integrative inference of transcriptional networks in Arabidopsis yields novel ROS signalling regulators

  • 1. Unravelling transcription factor functions through integrative inference of transcriptional networks in Arabidopsis KlaasVandepoele Department of Plant Biotechnology and Bioinformatics, UGent VIB Center for Plant Systems Biology Utrecht Artificial intelligence in plant science and breeding 24 February 2021 plaza_genomics
  • 2. Comparative Network Biology -Vandepoele lab • Extract biological knowledge from large-scale experimental data sets using data integration, comparative sequence & expression analysis, and network biology, to improve our understanding of gene functions and regulation in plants and diatoms. Plant Gene Regulatory Networks Comparative functional genomics • PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Van Bel et al., 2018, Nucleic Acids Res • Curse: building expression atlases and co-expression networks from public RNA-Seq data. Vaneechoutte D, Vandepoele K. Bioinformatics. 2019 • TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. Kulkarni et al., 2018, Nucleic Acids Res • Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data. Kulkarni et al., Plant Physiol. 2019
  • 3. Mapping of Gene Regulatory Networks (GRNs) Mejia-Guerra et al., 2012 Arabidopsis -1,700-2,500 Transcription Factors - 180-791 miRNA - 2,708 expressed lncRNA 49MB non-coding DNA AtRegNet: 17,224 regulatory interactions
  • 4. Experimental characterization of transcriptional activity and regulatory control ENCODE How to integrate the biological knowledge captured by different –omics layers to build better networks reporting functional regulatory interactions?
  • 5. 1.TF ChIP-Seq • in vivo method to measure protein-DNA interactions using chromatin immuno- precipitation • Different cellular conditions can be profiled TF ChIP-Seq Furey et al., 2012 ChIP
  • 6. Gene annotation Reads sample Control Peak / Motif Read pileup Output ChIP-Seq peak calling procedure displayed in genome browser Position Weight Matrix (PWM) TF target gene
  • 7. 2. in vitroTF binding specificities Wang et al., 2011 ModelTF binding site as Position Weight Matrix (PWM) based on k-mer signals Protein binding microarray target gene Arabidopsis: PWMs for 990TFs PWM
  • 8. 3. DNase-seq - Profiling of accesible chromatin Hesselberth et al, 2009 binding site TF protein DH footprint DNase I hypersensitive site (DHS) DHS+PWM
  • 9.  Map all known PWMs on the promoters of the Arabidopsis query gene and its orthologs  Count per PWM position the #species that support a TF binding site  Significance estimation (FDR<10%) Van deVelde et al., Plant Phys 2016 -A Collection of Conserved Non-Coding Sequences to StudyGene Regulation in Flowering Plants. Conserved PWM 4. Detection of conservedTF binding sites using phylogenetic footprinting
  • 10. 5-6. Network inference based on expression data TF target Expression-based network inference GENIE3 - Huynh-Thu et al., 2013 GENIE3 COE TF-target GENIE3
  • 11. 7. Co-expression + PWM enrichment • Integrate co-regulatory gene expression data withTF binding sites (PWMs) PWM enrichment in kNN co-expression cluster (hypergeometric distribution) COE+PWM
  • 12. Benchmarking of different methods to map gene regulatory networks Gold standard: 5.7k interactions covering 522 TFs (AtRegNet) Test set: 20% of gold standard (80% used for training)
  • 13. Benchmarking of different methods to map gene regulatory networks JanVan deVelde Gold standard: 5.7k interactions covering 522TFs (AtRegNet+literature) Test set: 20% of gold standard (80% used for training later)
  • 14. Supervised learning: a network-based approach for large- scale functional data integration Marbach et al., Genome Research 2012 Gradient Boosting Machine • 1000 trees (shrinkage of 0.01, interaction depth 3, 10-fold CV training) • 80% training data withTrue:False sampling ratio of 3:1 • 7 input networks d Motifs of TFs ChIP Binding of TFs TF Motif Occurence in Open Chromatin TF-Gene Co-Expression targets 1800 TFs Supervised Network Known Interactions Training Data Test Data Cross-validation Input Features Classifier Target gene TF gene COE TF-target GENIE3 COE+PWM DHS+PWM Conserved PWM PWM Supervised Learning max F1: 1793k interactions – 1766TFs Gold standard Max F1 network -Test set Recall: 46% Precision: 71% F1-measure: 57% ChIP
  • 15. Performance supervised learning network (iGRN) Supervised Learning max F1
  • 16. Different support of input networks for iGRN Supervised Learning max F1
  • 17. iGRN captures functionalTF – target gene interactions -- overlap target genes not significant (p-value hypergeometric distribution > 0.05) -- 34/40TFs have significant overlap between predicted target genes and DE genes afterTF perturbation -- -- -- -- --
  • 18. iGRN-based functional annotation ofTFs Recovery of experimental Gene Ontology Biological Process annotations for TFs with known function TF TF function Target genes • Recovery of known experimentally- supported functions for >600 TFs • Novel functional predictions for 268 unknown TFs • Highly complementary with AraNet v2 iGRN
  • 19. catalase H2O2 H20 3-AT H2O2 PSI MV photorespiration CIII AntA O2 - H2O2 O2 - H2O2 H2O2 Retrograde signaling Defense response PCD Oxidative stress signaling In house dataset of ROS marker genes Willems et al., 2016, Plant Physiology Van Breusegem lab - VIB
  • 20. Prediction and evaluation of novel oxidative stressTFs (“ROS-TFs”) Target gene enrichment: • ROS wheel • GO-BP ‘response to oxidative stress’ NovelTF functions (e.g. oxidative stress responses) TF Rank ID Gene name q-val enrichme nt Phenotype 1AT5G63790 ANAC102,NAC102 1,84E-35 22,74Oxidative (our data) 2AT3G55980 ATSZF1,SZF1 5,10E-26 32,71Salt 3AT2G37430 ZAT11 2,81E-25 14,41Oxidative (paraquat, Ni) 4AT2G40140 ATSZF2,CZF1 6,41E-25 17,45Salt 5AT5G59820 AtZAT12,RHL41,ZAT12 8,70E-25 18,64Oxidative and abiotic 6AT5G24110 ATWRKY30,WRKY30 1,42E-24 6,82oxidative, salt (at early developmental stage) 7AT2G38470 ATWRKY33,WRKY33 5,03E-24 6,85Pathogen, salt, heat stress 8AT2G46400 ATWRKY46,WRKY46 9,78E-24 5,79Osmotic/salt 9AT4G17500 ATERF-1,ERF-1 2,57E-23 9,42Biotic … 10AT1G28370 ATERF11,ERF11 3,10E-23 6,75Osmotic, ET signaling 11AT4G18880 AT-HSFA4A,HSF21 5,09E-23 6,39oxidative, salt 12AT2G23320 AtWRKY15,WRKY15 7,46E-23 6,31Oxidative, salt 13AT5G04340 C2H2,CZF2,ZAT6 7,52E-23 14,09 Cadmium, salt, osmotic stress, P deficiency, pathogen, drought, cold 14AT1G80840 ATWRKY40,WRKY40 8,93E-23 6,51Biotic, MRS 15AT3G23250 ATMYB15,ATY19 1,86E-22 5,71Drought, cold 16AT1G27730 STZ,ZAT10 4,90E-22 9,99Oxidative transcripts, abiotic, …. 17AT5G49520 ATWRKY48,WRKY48 5,36E-22 11,26Biotic 18AT5G13080 ATWRKY75,WRKY75 1,40E-21 5,81Phosphate starvation, biotic 19AT4G17230 SCL13 3,76E-21 16,60Phytochrome dependent light signaling 20AT5G59450 AT5G59450 5,14E-21 19,95cell division 21AT1G42990 ATBZIP60,BZIP60 1,36E-20 5,46ER, unfolded protein 22AT1G18570 AtMYB51,BW51A 1,84E-20 5,82glucosinolate biosynthesis 23AT4G23810 ATWRKY53,WRKY53 6,64E-20 7,56 leaf senescence and regulation of oxidative stress genes 24AT1G66550 ATWRKY67,WRKY67 3,25E-19 8,88None; no lines available 25AT3G23220 ESE1 3,54E-19 7,50Salt, ET signaling 26AT5G47230 ATERF5,ATERF-5 3,60E-19 6,00Biotic 27AT3G54810 BME3,BME3-ZF,GATA8 2,01E-18 14,92Germination, salt/drought 28AT5G47220 ATERF2,ATERF-2,ERF2 4,16E-18 5,57Biotic 29AT2G40740 ATWRKY55,WRKY55 6,69E-18 5,70None 30AT4G36990 ATHSF4,AT-HSFB1,HSF4 1,94E-17 5,98 similar to heat shock factor, no known phenotype 31AT2G30250 ATWRKY25,WRKY25 1,59E-16 4,81Biotic, salt 32AT4G31550 ATWRKY11,WRKY11 2,03E-16 5,33Biotic 33AT5G01380 GT3a 2,15E-16 4,93None 34AT5G22570 ATWRKY38,WRKY38 2,19E-16 6,01Biotic 35AT3G23240 ATERF1,ERF1 2,36E-15 4,16ET 36AT4G17490 ATERF6,ERF6,ERF-6-6 3,13E-15 5,26Oxidative 37AT4G22950 AGL19,GL19 3,37E-15 15,26flowering 38AT3G10500 ANAC053,NAC053,NTL4 3,63E-15 5,02Oxidative/ROS 39AT3G49530 ANAC062,NAC062,NTL6 1,76E-14 7,24ER, unfolded protein 40AT5G62020 AT-HSFB2A,HSF6,HSFB2A 5,87E-14 5,87 41AT1G22070 TGA3 6,60E-14 5,33biotic 42AT1G67970 AT-HSFA8,HSFA8 7,97E-14 11,40redox dependent nucleus translocation … Unknown or no stress-related function Oxidative stress function Other (a/biotic) stress function Ranking based on ROS wheel target genes enrichment (n=124 TFs)
  • 21. Functional validation of the predicted ROS-TFs Inge De Clercq
  • 22. 13/32 regulators were validated for a function in ROS responses by phenotyping Rank – TF - perturbation
  • 23. Phenotypes for predicted ROS-TFs iGRN identified novel ROSTFs from the GRAS, BES1 and GATA families
  • 24. Expression patterns for novel ROS-TFs Responsiveness to a wide range of oxidative stress conditions? • 14/17 known ROS TFs • 6/13 novel ROS TFs Many novel ROS TFs would not have been predicted solely relying on differential expression at the whole plant or organ level!
  • 25. Conclusions JanVan deVelde Inge De Clercq  Different regulatory –omics data types as well as advanced computational integration methods contribute significantly to the improved delineation of high-quality gene regulatory networks  TF binding site-based as well as expression-based regulatory networks offer a complementary view on functional gene regulatory interactions  Gene regulatory networks obtained by supervised learning are a starting point for  the systematic functional/regulatory annotation of all Arabidopsis genes  new biological discoveries Li Liu, DriesVaneechoutte Robin Pottie, Xiaopeng Liu, FrankVan Breusegem
  • 26. Further reading Curse: Building expression atlases and co-expression networks from public RNA-Seq data. Vaneechoutte and Vandepoele (2019) Bioinformatics TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information Kulkarni, Vaneechoutte, Van de Velde and Vandepoele (2018). Nucleic Acids Research

Editor's Notes

  1. Good afternoon. Thanks to the organizers for the invitation to present our work today.
  2. Through the development and application of various bioinformatics methods, we try to identify new aspects of genome biology, especially in the area of gene function prediction, gene regulation and evolutionary/systems biology.
  3. For today, the playground will be Arabidopsis, a plant model with a fairly simple genome, but still containing thousands of regulators. When mapping gene regulatory networks, the goal is to identify which TF binds to the promoter of which target gene, and to determine what is the functional regulatory consequence for growth, development or stress response. Especially the latter criterion is important, as we know that many in vivo TF binding events do not lead to changes in transcriptional activity of the target gene.
  4. Thus, if we focus on different molecular profiling methods assessing chromatin state, TF binding or transcriptional activity, a major challenge is … In the next slides, I will very briefly introduce different experimental and computational methods to map GRNs, which will be discussed, compared and integrated using ML. In a second part, I will show how such an integrated network has a good predictive power to study TF functions.
  5. Thus, in total I will present 7 approaches, which will be input networks, which all have their strengths and weakness, for example with respect to completeness and accuracy.
  6. Apart from capturing potential TF binding events, which can be considered as an input signal to transcriptional control, it is also possible to focus on output signals, which is for example the spatial-temporal expression of all genes in the genome. - GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. - In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. - The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. -Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed
  7. PWM mapping: FP problem PWM mapping – reduce witthrough integration co-regulatory information
  8. Explain all input networks again + size of the networks. How good do these capture true regulatory events?
  9. Given the strenghts and weaknesses of these different networks, a network-based approach for large-scale functional data integration was applied. Gradient boosting produces a prediction model for classification using an ensemble of weaker prediction models. The family of boosting methods is based on a constructive strategy of ensemble formation. The main idea of boosting is to add new models to the ensemble sequentially. At each particular iteration, a new weak, base-learner model is trained with respect to the error of the whole ensemble learnt so far.  Model = Decision trees; Posterior prob to be a true interaction: 0.35
  10. 22, we can then show results for recent ChIP-Seq data + TF DE gene sets (incl. significance overlap using hypergeometric test).
  11. 70% regulatory interactions confirmed by PWM, 50% by PWM in DHS (accessible TFBS) 50% regulatory interactions confirmed by GENIE3, so also regulated info well integrated in the network!
  12. Also very good overlap with recent TF ChIP-Seq datasets.
  13. >600 known TF-GO BP annotations recovered ; examples cover Abiotic stress, development, hormone signalling
  14. Given the good performance to predict known TFs functions, we next wanted to experimentally evaluate novel functional predictions. Excessive production of ROS can cause damage and lead to cell death, but on the other hand, ROS can also serve as signaling molecules in oxidative stress responses. We recently obtained a dataset of robust ROS marker genes, called the ROS wheel, and we were interested in identifying the regulators of these ROS genes.
  15. We identified TFs for which the predicted target genes are enriched for ROS wheel markers and oxidative stress functions. We obtained a list of 124 TFs what we call the ROS-TFs and ranked based on ROS wheel target genes enrichment. In dark blue are 17 TFs that have been shown to have a phenotype under oxidative stress conditions. 58 TFs in light blue play a role under other stress conditions. The 49 TFs in orange have no known role in oxidative stress and these are our novel candidate ROS-TFs that we want to functionally validate. Considering the top 124 TFs, 17 TFs with known oxidative stress function (dark blue) – low ranks! 58 TFs with other stress function 49 TFs no known role in ROS/environmental stress
  16. For experimental validation, we selected the top 32 novel ROS-TFs, for which at least one Arabidopsis loss- or gain-of-function transgenic line was available. Plant performance under oxidative stress was assessed based on rosette growth of the mutant lines compared to the wild type when grown at low and high MV and 3-AT concentrations, respectively, using automated image analysis. In addition, bleaching, indicating chlorosis symptoms, were visually scored. ROS causing agents: MV : methyl viologen 3-AT: 3-amino triazole
  17. For 13 of the 32 phenotyped novel ROS TFs we found at least one mutant allele with significant oxidative stress-induced changes in rosette area relative to the wild type, thus indicating a genotype effect dependent on the oxidative stress treatment and/or genotype-by-treatment effect under at least one of the tested stress treatments. Explain figure. For SCL13, WRKY45, BEH2 and PHL1, experimental evidence is provided by at least two mutant alleles.