Deciphering the regulatory
                        code in the genome
                      PhD completion seminar
                      Denis C. Bauer

                      Institute for Molecular Bioscience
                      The University of Queensland,
                      Australia


By yankodesign                                             by linh.ngân 
Research Aim
 Thermodynamic model
Develop a method that translates the
 regulatory message in the DNA of when and
 how strong a gene is expressed.


   AAGAAGGTTTTAGTTTAGCC     Express gene with 
   CACCGTAGGTACCTGAAGAA
   GAAGGTTTTAGTTTAGCCCA    70% capacity when it 
   CCGTAGGTACCTGAAG           is hot, Thanks! 
Why understanding transcriptional
      regulation is important?
•  Insight in the biology of gene pathways.
•  Search for regulatory regions with specific function.
•  “Re-programming” of genes has therapeutic
   potential.


                                 A                                    transcription

                                                               gene
                                                    promoter

 DNA

    Broken regulatory    Design and insert a new 
        element            regulatory element 
What do we need to know 
for  building  a  model  able 
to translate the regulatory 
message ? 
Background : Enhancer
•  Genes can have independent “switches” (Enhancer)
   beyond the core promoter, which can start the
   transcription of the target gene under different
   conditions.
                                                   transcription


                                            gene
                                 promoter




           enhancer regions
Background: Enhancer
   •  Transcription is regulated by the binding of activator
      and repressor TFs to an enhancer region.

                     enhancer


binding site map



                   Active
     TF               8 Activators                   transcription
Concentration
                      2 Repressors
Background: Repression
   •  Transcriptional regulation is also dependent on the
      interplay between activators and repressors, i.e.
      where they bind relative to each other.
                               Repressor range




binding site map

                    enhancer
On  which  system  would 
we  test  the  model’s 
abiliJes ? 
Background: Even-skipped gene (eve)
                       Drosophila melanogaster 1




                       Embryo stained for eve 2




                       Function representation 3


                                   1 hLp://insects.eugenes.org/ 
                                   2 Small et al. 
                                   3 hLp://bioinform.geneJka.ru 
Background: Regulation of eve
                 MSE                    MSE                    eve                                    MSE                   MSE MSE
Late1            3+7                        2            P                       late2                     4+6                    1        5 




                                                                                                     lacZ 




                                              Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the 
                                              Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
Hypothesis


           TF            Bindin
                   ns                       Genome  
                                                      
      conce ntraJo             g site 
                            map                      re,
                                         a rchitectu
                                              RNA, 
                                                       n,
                                          m  ethylaJo
                                                 … 




predicts gene activation
Research Goals
•  Optimize Thermodynamic models
   efficiently.
•  Analyze robustness of these
   models.
•  Explore the regulation of a
     particular gene.
•  Examine how the regulatory program evolves.
•  Extend current thermodynamic model.


                                                 Cooperphoto/CORBIS 
Model definition
 Site occupancy (Hill function)
                Kt · K(s, t) · [t]
  p(s, t) =
              1 + Kt · K(s, t) · [t]                                                                      Free parameters
                                                                                                    TF PARAMS
 Total activation
                                                                                                     K           Binding affinity
W (S, T ) =            Ets p(s, ts )            1 − Ets · p(s , ts ) · d(s, s )
               s∈S A                   s ∈S R
                                                                                                      E           Effectiveness
                                                quenching of the activator
                                         activator contribution                                     GENERAL PARAMS

 Transcription rate (Arrhenius function)
                                                                                                    R0 Max. transcription
            R           exp W (S, T ) − G0               iff W < G0                                                             rate
               0
R(S, T ) =
           
             R0                                            otherwise,
                                                                                                     G0          Energy barrier  


                       ts                                    ts
                                                                                                                Buena Vista Pictures 
                       s                                    s
                                            Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the 
                                            Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
Training the model




                                            200
                                            100
                                            50
                                            0
                                                           < [TF ], [TF ], [TF ], [TF ] >
                                                       0         20            40        60       80       100



                                                                      1             2         3        4

                  TF Binding                                TF Concentration



                                    Thermodynamic
                                        Model

      predicted                                                                         Adjust model
expression and                                                                          parameters to
                    150
                    100




  compare it to                                                                         improve fit
                    50




         target
                    0




                          40   50      60         70        80            90
Optimization methods
•  Two optimization paradigms
   –  Simulated Annealing
      •  LAM schedule (Reinitz et al. 2003)
      •  Geometric cooling
   –  Gradient descent
      •  Three GD variants approximating the objective function, which
         was not continuously differentiable.
•  Judged on accuracy achieved in the given time
   –  Drosophila MSE2 data with 400 data points and 7 TF
      (16 free parameters).
Optimization
            Simulated Annealing                                            Gradient Descent




                                                           1.00


                                                                       20




                                                                                                                                        20
                                                                                                                SA LAM
     0.99




                                                                                                                SA geom




                                                           0.99


                                                                       15




                                                                                                                                        15
                                                           RMS error
                                                                0.98




                                                                                                                            RMS error
CC




                                                      CC




                                                                       10




                                                                                                                                        10
     0.97




                                                           0.97
                                                                                                            SA_geom




                                                                       5




                                                                                                                                        5
                                                           0.96
                                                                                                            GD_softmax
                                       SA LAM
                                                                                                            GD_nomax
                                       SA geom
     0.95




                                                                                                            GD_max




                                                           0.95


                                                                       0




                                                                                                                                        0
             1   2   5 10       50       200                           1    2   1   5
                                                                                    2   105    20
                                                                                               10    50   100
                                                                                                          50    200200500
                                                                                         time [minutes]
                      time [minutes]                                                          time [minutes]


                            Suggests: many local minima.
                                 Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal 
                                 regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
If  gradient  descent  gets 
stuck  in  local  minima  all 
the  Jme,  how  does  the 
opJmizaJon  landscape 
look like ? 
Landscape analysis
•  Synthetic data based on real MSE2 data
  –  global minimum and solution (parameter values) are
     known.
  –  Measuring distance of the optimization solution to the
     starting position and the known solution.
  –  Measuring error reduction at the
     solution compared to the
     starting position.
Landscape analysis
Experiment      Ini$al distance to  Final distance to              Error Red. 
                solu4on (mean)      solu4on                        (mean) 
                                    (mean) 
1% perturbed     3.4·10−4                   2.8·10−4               88% 
random          0.1                      0.11                      97% 




                                                                           Conclusion:
                                                                           many local
                                                                           minima.
                       Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal 
                       regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
Does the model over-fit ?
•  Cross-validation (5-fold)
            Experiment   Mean RMS error    Mean CC  
                         (SE)              (SE) 
            training     13.39 (0.004)     0.92  (4.8 · 10−5 )
            tesJng       14.04 (0.005)     0.91  (5.7 · 10−5 )



•  Redundancy reduction
   –  Not enough data to begin with
Summary: Optimization & Analysis
•  The objective function is
   ill-posed.
   –  It has a plethora of local
      minima.
   –  It might have many
      global minima.
•  Hence SA is the
   method of choice.
•  There might be a
   tendency to over-fit the
   data.
                                   hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html 
                                                                        hLp://images.nciku.com/ 
Research Goals
•  Optimize Thermodynamic models
   efficiently
•  Analyze robustness of these
   models
•  Explore the regulation of a
     particular gene
•  Examine how the regulatory program evolves
•  Extend current thermodynamic model


                                                Cooperphoto/CORBIS 
Regulation and Evolution of eve
•  Mechanism for regulating eve is
   conserved:
   –  Stripe 2 elements from other
      Drosophila species activate
      eve in D. mel. correctly.
   –  Despite the substantial
      difference in the
      regulatory DNA
      sequence.

                                                                                hLp://www.bio.ilstu.edu/Edwards/ 

                    Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila 
                    despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  
Evaluate Evolution of MSE2
•  Test if the model can identify the MSE2 in these
   other species.

•  Test if the model correctly predicts the
   transcriptional output of the homologous MSE2s.
Searching for MSE2
•  Apply a model trained on D. mel. MSE2 to the TFBS-map
   from sequential windows to find the MSE2 in other
   species
                        MSE2              promoter
                                                           eve
    Other species




                                                                    150
                                                                    100
                                                                    50
                                                                    0
                                                                          40   50   60   70   80   90




                                                                    150
                                        RMS error




                                                                    100
                                                                    50
                                                                    0
                                                                          40   50   60   70   80   90




<   23 27 43        …   13                                    …
                                                                                              >

                         Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 
                         and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
Searching for MSE2: Result
•  Correctly identified the MSE2 in 6/8 species




                                                                                             40
           D. melanogaster




                                                                                             30
                                                                                             20
                                                                                                   RMS error 
                                                                                             10
                                                                                             40
           D.pseudoobscura




                                                                                             30
                                                                                             20
                                                                                             10
                                                                                                  rms error
                                   Genomic locaJon 




                                                                                             40
                             Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 




                                                                                             30
           rimshawi




                             and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   




                                                                                             20
Predicting the output in other species
                   •  Apply a model trained on D. mel. MSE2 to the MSE2s
                      in other species
D. melanogaster 

                                           15




                                                                                                                                                 150
                                                                                                                                                                          Target
                                           10




                                                                                                                                                                          D. melanogaster
                   Log odds score (bits)




                                                                                                                    relative RNA concentration
                                           5




                                                                                                                                                                          D. pseudoobscura
                                           0




                                                                                                                                                                          D. ananassae
                                           !5




                                                                                                                                                 100
                                                                                                                                                                          D. mojavensis
                                           !10
                                           !15




                                                 0   500                           1000                1500
D. mojavensis 




                                                                 rel. genomic position




                                                                                                                                                 50
                                                       bicoid   kruppel         giant      hunchback
                                                       knirps   caudal          tailless




                                                                                                                                                 0
                                                                                                                                                       40   50       60     70      80   90

                                                                                                                                                                 A!P position (%)

                                                                                           Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules 
                                                                                           and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
Summary Application
•  Model fits the data
   qualitatively.
•  Predictions are biologically
   meaningful.

•  However, there is room for
   improvement.
Research Goals
•  Optimize Thermodynamic models
   efficiently
•  Analyze robustness of these
   models
•  Explore the regulation of a
     particular gene
•  Examine how the regulatory program evolves
•  Extend current thermodynamic model


                                                Cooperphoto/CORBIS 
One role fits them all?
•  Dual function is proposed for some of the regulatory
   TFs.
   –  E.g. TF Hunchback (Hb) might be an activator when
      regulating stripe2 and repressor for stripe3.


   Late1            3+7                        2            P                       late2                     4+6                    1        5 




                                                 Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the 
                                                 Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906  
                                                 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of 
                                                 Drosophila. PLoS Biol, 2004, 2, E271  
Determine the regulatory role of TFs
•  Different data set: 44 CRMs important for D. mel.
   development but same set of TFs.
•  Determine the best role for each TF in each of the
   CRMs
   –  Brute Force: train a model for all TF role-combinations on
      each of the 44 CRMs.
   –  Record the correlation achieved.
   –  Identify TFs that have dual-function.


                     Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes 
                     Drosophila segmentaJon. Nature, 2008, 451, 535‐540 
                     Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by 
                     SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed 
                     for publicaJon, 2009 
TFs with dual role
                       Bcd         Cad         Hb           Tll          Gt           Kr           Kni         TorRE 
 Det. roles                s           +           s               ‐         s              s          ‐            s 
 Literature               +            +           s               ‐        (s)             s          ‐          NA 
 (consensus) 

 “s”: dual-functioning, “+”: activator, “-”: repressor.


•  E.g. Hb
     –  Activator for 17 CRMs
     –  Repressor for 27 CRMs




                                       Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster. 
                                       PLoS Comput Biol, 2006, 2, e51  
                                       Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of 
                                       Drosophila. PLoS Biol, 2004, 2, E271  
Improvement with dual function
                                  kr_CD1_ru                                                       hb_anterior_actv
       1.0




                                                                              1.0




                                                                                                                                                   1.0
                 target
                 previous roles
                 HbDual                                                       Experiment         number of            mean CC  
                 KrDual                                                                          free                 (SE) 
       0.8




                                                                              0.8




                                                                                                                                                   0.8
                 HbKrDual
                 best                                                                            parameters 
                                                                              Previous                 18              0.27 (0.008) 
       0.6




                                                                              0.6




                                                                                                                                                   0.6
mRNA




                                                                       mRNA




                                                                                                                                            mRNA
                                                                              roles 
                                                                              HbDual                   19              0.35 (0.009) 
       0.4




                                                                              0.4




                                                                                                                                                   0.4
                                                                              KrDual                   19              0.37 (0.007) 
       0.2




                                                                              0.2




                                                                                                                                                   0.2
                                                                              HbKrDual                 20              0.38 (0.007) 
       0.0




                                                                              0.0




                                                                                                                                                   0.0
             0      20            40        60     80        100                    0       20        40         60         80        100

                                       AP                                                                   AP

                                          Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by 
                              run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed 
                                                                                                     eve_37ext_ru
                                          for publicaJon, 2009 
       .0




                                                                              .0




                                                                                                                                                   .0
Marker motifs for dual function
•  Running MEME on the protein sequence of dual-
   functioning TFs to find short motifs (<6aa) present
   in all of them.




                       CI                              KE
              4                               4




                                                     Q
              3                               3


                  K D                               ID
           bits




                                           bits
              2
                    G                         2


              1


              0
                  L E
                  Y         Q
                                              1


                                              0
                                                  L
                                                  V
                   1
                   2
                   3
                   4




                                                   1
                                                   2
                                                   3
                                                   4
            MEME (no SSC) 15.07.09 12:07    MEME (no SSC) 15.07.09 12:07




                                           SUMOyla(on 
                                              mo(f 
SUMOylation
•  Small Ubiquitin-related Modifier a                                                         SUMO
                                                                                            protease
                                                                                    SU
   small protein covalently attached              ATP


   to target-proteins.                                                                                 SU

                                                                                SUMO
•  Involved in many pathways/                      SU
                                                                               pathway
   mechanisms                        E1 activating
                                          enzyme

    –  Compartmentisation                                                                                     target protein
                                                                                               + E3 ligasis
    –  Transcriptional regulation                                                   SU

        •  Can reverse the function of a TF e.g.                                    E2 conjugating
                                                                                    enzyme
           Ikaros (the human homologue of Kr)

•  SUMO (Smt3) is present in D. mel during development

                          Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in 
                          developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, 
                          in submission  
                          del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005, 
                          25, 2688‐2697   
Conclusion
•  Thermodynamic models can be best optimized using SA but
   over-fitting is an issue to keep in mind.
       Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  



•  Non-the-less, they are applicable for
   –  examining the mechanisms of transcriptional regulation,
   –  explore the evolution of a particular regulatory mechanism
       Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   



•  Model prediction improves when dual-function is allowed.
       Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila 
       melanogaster submiLed for publicaJon, 2009 


   –  SUMOylation seems to be a good candidate for the biological
      mechanism of role-change.
       Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster 
       NeurocompuJng, 2009, in submission  
Acknowledgments
•  IMB                                          •    Funding
    –    Timothy Bailey (supervisor)                  –  Institute for Molecular
    –    Mikael Bodén (supervisor)                       Bioscience, The University of
    –    Sean Grimmond (thesis committee)
                                                         Queensland
    –    Nick Hamilton (thesis committee)
                                                      –  Australian Research Council
    –    Fabian Buske
                                                         Centre of Excellence in
    –    Stefan Maetschke
                                                         Bioinformatics
                                                      –  National Institutes of Health
•  Stony Brook University
    –  John Reinitz                                   –  UQ International Research
                                                         Tuition Award




                            Framework for modeling, visualizing, and predicJng the 
                            regulaJon of the transcripJon rate of a target gene 
                              www.bioinforma(cs.org.au/stream 
www.bioinforma(cs.org.au/stream 


•  Framework for modeling, visualizing,
   and predicting the regulation of the
   transcription rate of a target gene.
•  Publicly available
•  Modular: New functions can be
   plugged in




                                                        Many functions
  Command line




                             Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for 
                             transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545. 

Deciphering the regulatory code in the genome

  • 1.
    Deciphering the regulatory code in the genome PhD completion seminar Denis C. Bauer Institute for Molecular Bioscience The University of Queensland, Australia By yankodesign  by linh.ngân 
  • 2.
    Research Aim Thermodynamicmodel Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed. AAGAAGGTTTTAGTTTAGCC Express gene with  CACCGTAGGTACCTGAAGAA GAAGGTTTTAGTTTAGCCCA 70% capacity when it  CCGTAGGTACCTGAAG  is hot, Thanks! 
  • 3.
    Why understanding transcriptional regulation is important? •  Insight in the biology of gene pathways. •  Search for regulatory regions with specific function. •  “Re-programming” of genes has therapeutic potential. A transcription gene promoter DNA Broken regulatory  Design and insert a new  element  regulatory element 
  • 4.
    What do we need to know  for  building  a model  able  to translate the regulatory  message ? 
  • 5.
    Background : Enhancer • Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions. transcription gene promoter enhancer regions
  • 6.
    Background: Enhancer •  Transcription is regulated by the binding of activator and repressor TFs to an enhancer region. enhancer binding site map Active TF 8 Activators transcription Concentration 2 Repressors
  • 7.
    Background: Repression •  Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other. Repressor range binding site map enhancer
  • 8.
    On  which  system would  we  test  the  model’s  abiliJes ? 
  • 9.
    Background: Even-skipped gene(eve) Drosophila melanogaster 1 Embryo stained for eve 2 Function representation 3 1 hLp://insects.eugenes.org/  2 Small et al.  3 hLp://bioinform.geneJka.ru 
  • 10.
    Background: Regulation ofeve MSE MSE eve MSE MSE MSE Late1            3+7                        2            P                       late2                     4+6                    1        5  lacZ  Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  • 11.
    Hypothesis TF  Bindin ns  Genome     conce ntraJo g site  map  re, a rchitectu RNA,  n, m ethylaJo …  predicts gene activation
  • 12.
    Research Goals •  OptimizeThermodynamic models efficiently. •  Analyze robustness of these models. •  Explore the regulation of a particular gene. •  Examine how the regulatory program evolves. •  Extend current thermodynamic model. Cooperphoto/CORBIS 
  • 13.
    Model definition Siteoccupancy (Hill function) Kt · K(s, t) · [t] p(s, t) = 1 + Kt · K(s, t) · [t] Free parameters TF PARAMS Total activation K Binding affinity W (S, T ) = Ets p(s, ts ) 1 − Ets · p(s , ts ) · d(s, s ) s∈S A s ∈S R E Effectiveness quenching of the activator activator contribution GENERAL PARAMS Transcription rate (Arrhenius function)  R0 Max. transcription  R exp W (S, T ) − G0 iff W < G0 rate 0 R(S, T ) =  R0 otherwise, G0 Energy barrier   ts ts Buena Vista Pictures  s s Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  • 14.
    Training the model 200 100 50 0 < [TF ], [TF ], [TF ], [TF ] > 0 20 40 60 80 100 1 2 3 4 TF Binding TF Concentration Thermodynamic Model predicted Adjust model expression and parameters to 150 100 compare it to improve fit 50 target 0 40 50 60 70 80 90
  • 15.
    Optimization methods •  Twooptimization paradigms –  Simulated Annealing •  LAM schedule (Reinitz et al. 2003) •  Geometric cooling –  Gradient descent •  Three GD variants approximating the objective function, which was not continuously differentiable. •  Judged on accuracy achieved in the given time –  Drosophila MSE2 data with 400 data points and 7 TF (16 free parameters).
  • 16.
    Optimization Simulated Annealing Gradient Descent 1.00 20 20 SA LAM 0.99 SA geom 0.99 15 15 RMS error 0.98 RMS error CC CC 10 10 0.97 0.97 SA_geom 5 5 0.96 GD_softmax SA LAM GD_nomax SA geom 0.95 GD_max 0.95 0 0 1 2 5 10 50 200 1 2 1 5 2 105 20 10 50 100 50 200200500 time [minutes] time [minutes] time [minutes] Suggests: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  • 17.
    If  gradient  descent gets  stuck  in  local  minima  all  the  Jme,  how  does  the  opJmizaJon  landscape  look like ? 
  • 18.
    Landscape analysis •  Syntheticdata based on real MSE2 data –  global minimum and solution (parameter values) are known. –  Measuring distance of the optimization solution to the starting position and the known solution. –  Measuring error reduction at the solution compared to the starting position.
  • 19.
    Landscape analysis Experiment Ini$al distance to  Final distance to  Error Red.  solu4on (mean)  solu4on  (mean)  (mean)  1% perturbed  3.4·10−4 2.8·10−4 88%  random  0.1  0.11  97%  Conclusion: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  • 20.
    Does the modelover-fit ? •  Cross-validation (5-fold) Experiment Mean RMS error  Mean CC   (SE)   (SE)  training  13.39 (0.004)  0.92  (4.8 · 10−5 ) tesJng  14.04 (0.005)  0.91  (5.7 · 10−5 ) •  Redundancy reduction –  Not enough data to begin with
  • 21.
    Summary: Optimization &Analysis •  The objective function is ill-posed. –  It has a plethora of local minima. –  It might have many global minima. •  Hence SA is the method of choice. •  There might be a tendency to over-fit the data. hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html  hLp://images.nciku.com/ 
  • 22.
    Research Goals •  OptimizeThermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  • 23.
    Regulation and Evolutionof eve •  Mechanism for regulating eve is conserved: –  Stripe 2 elements from other Drosophila species activate eve in D. mel. correctly. –  Despite the substantial difference in the regulatory DNA sequence. hLp://www.bio.ilstu.edu/Edwards/  Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila  despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  
  • 24.
    Evaluate Evolution ofMSE2 •  Test if the model can identify the MSE2 in these other species. •  Test if the model correctly predicts the transcriptional output of the homologous MSE2s.
  • 25.
    Searching for MSE2 • Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species MSE2 promoter eve Other species 150 100 50 0 40 50 60 70 80 90 150 RMS error 100 50 0 40 50 60 70 80 90 < 23 27 43 … 13 … > Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  • 26.
    Searching for MSE2:Result •  Correctly identified the MSE2 in 6/8 species 40 D. melanogaster 30 20 RMS error  10 40 D.pseudoobscura 30 20 10 rms error Genomic locaJon  40 Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  30 rimshawi and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    20
  • 27.
    Predicting the outputin other species •  Apply a model trained on D. mel. MSE2 to the MSE2s in other species D. melanogaster  15 150 Target 10 D. melanogaster Log odds score (bits) relative RNA concentration 5 D. pseudoobscura 0 D. ananassae !5 100 D. mojavensis !10 !15 0 500 1000 1500 D. mojavensis  rel. genomic position 50 bicoid kruppel giant hunchback knirps caudal tailless 0 40 50 60 70 80 90 A!P position (%) Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  • 28.
    Summary Application •  Modelfits the data qualitatively. •  Predictions are biologically meaningful. •  However, there is room for improvement.
  • 29.
    Research Goals •  OptimizeThermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  • 30.
    One role fitsthem all? •  Dual function is proposed for some of the regulatory TFs. –  E.g. TF Hunchback (Hb) might be an activator when regulating stripe2 and repressor for stripe3. Late1            3+7                        2            P                       late2                     4+6                    1        5  Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the  Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  • 31.
    Determine the regulatoryrole of TFs •  Different data set: 44 CRMs important for D. mel. development but same set of TFs. •  Determine the best role for each TF in each of the CRMs –  Brute Force: train a model for all TF role-combinations on each of the 44 CRMs. –  Record the correlation achieved. –  Identify TFs that have dual-function. Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes  Drosophila segmentaJon. Nature, 2008, 451, 535‐540  Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  for publicaJon, 2009 
  • 32.
    TFs with dualrole Bcd  Cad  Hb  Tll  Gt  Kr  Kni  TorRE  Det. roles  s  +  s  ‐  s  s  ‐  s  Literature  +  +  s  ‐  (s)  s  ‐  NA  (consensus)  “s”: dual-functioning, “+”: activator, “-”: repressor. •  E.g. Hb –  Activator for 17 CRMs –  Repressor for 27 CRMs Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster.  PLoS Comput Biol, 2006, 2, e51   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  • 33.
    Improvement with dualfunction kr_CD1_ru hb_anterior_actv 1.0 1.0 1.0 target previous roles HbDual Experiment number of  mean CC   KrDual free  (SE)  0.8 0.8 0.8 HbKrDual best parameters  Previous  18  0.27 (0.008)  0.6 0.6 0.6 mRNA mRNA mRNA roles  HbDual  19  0.35 (0.009)  0.4 0.4 0.4 KrDual  19  0.37 (0.007)  0.2 0.2 0.2 HbKrDual  20  0.38 (0.007)  0.0 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 AP AP Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  eve_37ext_ru for publicaJon, 2009  .0 .0 .0
  • 34.
    Marker motifs fordual function •  Running MEME on the protein sequence of dual- functioning TFs to find short motifs (<6aa) present in all of them. CI KE 4 4 Q 3 3 K D ID bits bits 2 G 2 1 0 L E Y Q 1 0 L V 1 2 3 4 1 2 3 4 MEME (no SSC) 15.07.09 12:07 MEME (no SSC) 15.07.09 12:07 SUMOyla(on  mo(f 
  • 35.
    SUMOylation •  Small Ubiquitin-relatedModifier a SUMO protease SU small protein covalently attached ATP to target-proteins. SU SUMO •  Involved in many pathways/ SU pathway mechanisms E1 activating enzyme –  Compartmentisation target protein + E3 ligasis –  Transcriptional regulation SU •  Can reverse the function of a TF e.g. E2 conjugating enzyme Ikaros (the human homologue of Kr) •  SUMO (Smt3) is present in D. mel during development Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in  developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009,  in submission   del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005,  25, 2688‐2697   
  • 36.
    Conclusion •  Thermodynamic modelscan be best optimized using SA but over-fitting is an issue to keep in mind. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646   •  Non-the-less, they are applicable for –  examining the mechanisms of transcriptional regulation, –  explore the evolution of a particular regulatory mechanism Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    •  Model prediction improves when dual-function is allowed. Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila  melanogaster submiLed for publicaJon, 2009  –  SUMOylation seems to be a good candidate for the biological mechanism of role-change. Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster  NeurocompuJng, 2009, in submission  
  • 37.
    Acknowledgments •  IMB •  Funding –  Timothy Bailey (supervisor) –  Institute for Molecular –  Mikael Bodén (supervisor) Bioscience, The University of –  Sean Grimmond (thesis committee) Queensland –  Nick Hamilton (thesis committee) –  Australian Research Council –  Fabian Buske Centre of Excellence in –  Stefan Maetschke Bioinformatics –  National Institutes of Health •  Stony Brook University –  John Reinitz –  UQ International Research Tuition Award Framework for modeling, visualizing, and predicJng the  regulaJon of the transcripJon rate of a target gene  www.bioinforma(cs.org.au/stream 
  • 38.
    www.bioinforma(cs.org.au/stream  •  Framework formodeling, visualizing, and predicting the regulation of the transcription rate of a target gene. •  Publicly available •  Modular: New functions can be plugged in Many functions Command line Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for  transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.