Deciphering the regulatory code in the genome

Deciphering the regulatory
code in the genome
PhD completion seminar
Denis C. Bauer

Institute for Molecular Bioscience
The University of Queensland,
Australia

By yankodesign by linh.ngân

Research Aim
Thermodynamic model
Develop a method that translates the
regulatory message in the DNA of when and
how strong a gene is expressed.

AAGAAGGTTTTAGTTTAGCC Express gene with
CACCGTAGGTACCTGAAGAA
GAAGGTTTTAGTTTAGCCCA 70% capacity when it
CCGTAGGTACCTGAAG is hot, Thanks!

Why understanding transcriptional
regulation is important?
•  Insight in the biology of gene pathways.
•  Search for regulatory regions with specific function.
•  “Re-programming” of genes has therapeutic
potential.

A transcription

gene
promoter

DNA

Broken regulatory Design and insert a new
element regulatory element

What do we need to know
for building a model able
to translate the regulatory
message ?

Background : Enhancer
•  Genes can have independent “switches” (Enhancer)
beyond the core promoter, which can start the
transcription of the target gene under different
conditions.
transcription

gene
promoter

enhancer regions

Background: Enhancer
•  Transcription is regulated by the binding of activator
and repressor TFs to an enhancer region.

enhancer

binding site map

Active
TF 8 Activators transcription
Concentration
2 Repressors

Background: Repression
•  Transcriptional regulation is also dependent on the
interplay between activators and repressors, i.e.
where they bind relative to each other.
Repressor range

binding site map

enhancer

On which system would
we test the model’s
abiliJes ?

Background: Even-skipped gene (eve)
Drosophila melanogaster 1

Embryo stained for eve 2

Function representation 3

1 hLp://insects.eugenes.org/
2 Small et al.
3 hLp://bioinform.geneJka.ru

Background: Regulation of eve
MSE MSE eve MSE MSE MSE
Late1 3+7 2 P late2 4+6 1 5

lacZ

Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the
Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165

Hypothesis

TF Bindin
ns Genome

conce ntraJo g site
map re,
a rchitectu
RNA,
n,
m ethylaJo
…

predicts gene activation

Research Goals
•  Optimize Thermodynamic models
efficiently.
•  Analyze robustness of these
models.
•  Explore the regulation of a
particular gene.
•  Examine how the regulatory program evolves.
•  Extend current thermodynamic model.

Cooperphoto/CORBIS

Model definition
Site occupancy (Hill function)
Kt · K(s, t) · [t]
p(s, t) =
1 + Kt · K(s, t) · [t] Free parameters
TF PARAMS
Total activation
K Binding affinity
W (S, T ) = Ets p(s, ts ) 1 − Ets · p(s , ts ) · d(s, s )
s∈S A s ∈S R
E Effectiveness
quenching of the activator
activator contribution GENERAL PARAMS

Transcription rate (Arrhenius function)
 R0 Max. transcription
 R exp W (S, T ) − G0 iﬀ W < G0 rate
0
R(S, T ) =

R0 otherwise,
G0 Energy barrier

ts ts
Buena Vista Pictures
s s
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the
Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165

Training the model

200
100
50
0
< [TF ], [TF ], [TF ], [TF ] >
0 20 40 60 80 100

1 2 3 4

TF Binding TF Concentration

Thermodynamic
Model

predicted Adjust model
expression and parameters to
150
100

compare it to improve fit
50

target
0

40 50 60 70 80 90

Optimization methods
•  Two optimization paradigms
–  Simulated Annealing
•  LAM schedule (Reinitz et al. 2003)
•  Geometric cooling
–  Gradient descent
•  Three GD variants approximating the objective function, which
was not continuously differentiable.
•  Judged on accuracy achieved in the given time
–  Drosophila MSE2 data with 400 data points and 7 TF
(16 free parameters).

Optimization
Simulated Annealing Gradient Descent

1.00

20

20
SA LAM
0.99

SA geom

0.99

15

15
RMS error
0.98

RMS error
CC

CC

10

10
0.97

0.97
SA_geom

5

5
0.96
GD_softmax
SA LAM
GD_nomax
SA geom
0.95

GD_max

0.95

0

0
1 2 5 10 50 200 1 2 1 5
2 105 20
10 50 100
50 200200500
time [minutes]
time [minutes] time [minutes]

Suggests: many local minima.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal
regulaJon. BioinformaJcs, 2009, 25, 1640‐1646

If gradient descent gets
stuck in local minima all
the Jme, how does the
opJmizaJon landscape
look like ?

Landscape analysis
•  Synthetic data based on real MSE2 data
–  global minimum and solution (parameter values) are
known.
–  Measuring distance of the optimization solution to the
starting position and the known solution.
–  Measuring error reduction at the
solution compared to the
starting position.

Landscape analysis
Experiment Ini$al distance to Final distance to Error Red.
solu4on (mean) solu4on (mean)
(mean)
1% perturbed 3.4·10−4 2.8·10−4 88%
random 0.1 0.11 97%

Conclusion:
many local
minima.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal
regulaJon. BioinformaJcs, 2009, 25, 1640‐1646

Does the model over-fit ?
•  Cross-validation (5-fold)
Experiment Mean RMS error Mean CC
(SE) (SE)
training 13.39 (0.004) 0.92 (4.8 · 10−5 )
tesJng 14.04 (0.005) 0.91 (5.7 · 10−5 )

•  Redundancy reduction
–  Not enough data to begin with

Summary: Optimization & Analysis
•  The objective function is
ill-posed.
–  It has a plethora of local
minima.
–  It might have many
global minima.
•  Hence SA is the
method of choice.
•  There might be a
tendency to over-fit the
data.
hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html
hLp://images.nciku.com/

Research Goals
•  Optimize Thermodynamic models
efficiently
•  Analyze robustness of these
models
•  Explore the regulation of a
particular gene
•  Examine how the regulatory program evolves
•  Extend current thermodynamic model

Cooperphoto/CORBIS

Regulation and Evolution of eve
•  Mechanism for regulating eve is
conserved:
–  Stripe 2 elements from other
Drosophila species activate
eve in D. mel. correctly.
–  Despite the substantial
difference in the
regulatory DNA
sequence.

hLp://www.bio.ilstu.edu/Edwards/

Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila
despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106

Evaluate Evolution of MSE2
•  Test if the model can identify the MSE2 in these
other species.

•  Test if the model correctly predicts the
transcriptional output of the homologous MSE2s.

Searching for MSE2
•  Apply a model trained on D. mel. MSE2 to the TFBS-map
from sequential windows to find the MSE2 in other
species
MSE2 promoter
eve
Other species

150
100
50
0
40 50 60 70 80 90

150
RMS error

100
50
0
40 50 60 70 80 90

< 23 27 43 … 13 …
>

Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules
and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220

Searching for MSE2: Result
•  Correctly identified the MSE2 in 6/8 species

40
D. melanogaster

30
20
RMS error
10
40
D.pseudoobscura

30
20
10
rms error
Genomic locaJon

40

30
rimshawi


20

Predicting the output in other species
•  Apply a model trained on D. mel. MSE2 to the MSE2s
in other species
D. melanogaster

15

150
Target
10

D. melanogaster
Log odds score (bits)

relative RNA concentration
5

D. pseudoobscura
0

D. ananassae
!5

100
D. mojavensis
!10
!15

0 500 1000 1500
D. mojavensis

rel. genomic position

50
bicoid kruppel giant hunchback
knirps caudal tailless

0
40 50 60 70 80 90

A!P position (%)


Summary Application
•  Model fits the data
qualitatively.
•  Predictions are biologically
meaningful.

•  However, there is room for
improvement.

One role fits them all?
•  Dual function is proposed for some of the regulatory
TFs.
–  E.g. TF Hunchback (Hb) might be an activator when
regulating stripe2 and repressor for stripe3.

Late1 3+7 2 P late2 4+6 1 5

Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the
Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906
Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of
Drosophila. PLoS Biol, 2004, 2, E271

Determine the regulatory role of TFs
•  Different data set: 44 CRMs important for D. mel.
development but same set of TFs.
•  Determine the best role for each TF in each of the
CRMs
–  Brute Force: train a model for all TF role-combinations on
each of the 44 CRMs.
–  Record the correlation achieved.
–  Identify TFs that have dual-function.

Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes
Drosophila segmentaJon. Nature, 2008, 451, 535‐540
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by
SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed
for publicaJon, 2009

TFs with dual role
Bcd Cad Hb Tll Gt Kr Kni TorRE
Det. roles s + s ‐ s s ‐ s
Literature + + s ‐ (s) s ‐ NA
(consensus)

“s”: dual-functioning, “+”: activator, “-”: repressor.

•  E.g. Hb
–  Activator for 17 CRMs
–  Repressor for 27 CRMs

Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster.
PLoS Comput Biol, 2006, 2, e51
Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of
Drosophila. PLoS Biol, 2004, 2, E271

Improvement with dual function
kr_CD1_ru hb_anterior_actv
1.0

1.0

1.0
target
previous roles
HbDual Experiment number of mean CC
KrDual free (SE)
0.8

0.8

0.8
HbKrDual
best parameters
Previous 18 0.27 (0.008)
0.6

0.6

0.6
mRNA

mRNA

mRNA
roles
HbDual 19 0.35 (0.009)
0.4

0.4

0.4
KrDual 19 0.37 (0.007)
0.2

0.2

0.2
HbKrDual 20 0.38 (0.007)
0.0

0.0

0.0
0 20 40 60 80 100 0 20 40 60 80 100

AP AP

Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by
run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed
eve_37ext_ru
for publicaJon, 2009
.0

.0

.0

Marker motifs for dual function
•  Running MEME on the protein sequence of dual-
functioning TFs to find short motifs (<6aa) present
in all of them.

CI KE
4 4

Q
3 3

K D ID
bits

bits
2
G 2

1

0
L E
Y Q
1

0
L
V
1
2
3
4

1
2
3
4
MEME (no SSC) 15.07.09 12:07 MEME (no SSC) 15.07.09 12:07

SUMOyla(on
mo(f

SUMOylation
•  Small Ubiquitin-related Modifier a SUMO
protease
SU
small protein covalently attached ATP

to target-proteins. SU

SUMO
•  Involved in many pathways/ SU
pathway
mechanisms E1 activating
enzyme

–  Compartmentisation target protein
+ E3 ligasis
–  Transcriptional regulation SU

•  Can reverse the function of a TF e.g. E2 conjugating
enzyme
Ikaros (the human homologue of Kr)

•  SUMO (Smt3) is present in D. mel during development

Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in
developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009,
in submission
del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005,
25, 2688‐2697

Conclusion
•  Thermodynamic models can be best optimized using SA but
over-fitting is an issue to keep in mind.
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646

•  Non-the-less, they are applicable for
–  examining the mechanisms of transcriptional regulation,
–  explore the evolution of a particular regulatory mechanism
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220

•  Model prediction improves when dual-function is allowed.
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila
melanogaster submiLed for publicaJon, 2009

–  SUMOylation seems to be a good candidate for the biological
mechanism of role-change.
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster
NeurocompuJng, 2009, in submission

Acknowledgments
•  IMB •  Funding
–  Timothy Bailey (supervisor) –  Institute for Molecular
–  Mikael Bodén (supervisor) Bioscience, The University of
–  Sean Grimmond (thesis committee)
Queensland
–  Nick Hamilton (thesis committee)
–  Australian Research Council
–  Fabian Buske
Centre of Excellence in
–  Stefan Maetschke
Bioinformatics
–  National Institutes of Health
•  Stony Brook University
–  John Reinitz –  UQ International Research
Tuition Award

Framework for modeling, visualizing, and predicJng the
regulaJon of the transcripJon rate of a target gene
www.bioinforma(cs.org.au/stream

www.bioinforma(cs.org.au/stream

•  Framework for modeling, visualizing,
and predicting the regulation of the
transcription rate of a target gene.
•  Publicly available
•  Modular: New functions can be
plugged in

Many functions
Command line

Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for
transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.

Deciphering the regulatory code in the genome

More Related Content

Viewers also liked

Similar to Deciphering the regulatory code in the genome

More from Denis C. Bauer

Recently uploaded

Deciphering the regulatory code in the genome