Deciphering the regulatory code in the genome

1,266 views

Published on

There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.

Published in: Education
2 Comments
0 Likes
Statistics
Notes
  • Nice presentation of pretty sophisticated stuff.Thumbs up!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Though i was unable to fully understand the presentation... my vote is up for you for the following reasons
    1. Design
    2. Content (Though I was unable to understand)
    3. Flow of the content.

    I also liked the font used... good luck for the contest
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
1,266
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
31
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

Deciphering the regulatory code in the genome

  1. 1. Deciphering the regulatory code in the genome PhD completion seminar Denis C. Bauer Institute for Molecular Bioscience The University of Queensland, Australia By yankodesign  by linh.ngân 
  2. 2. Research Aim Thermodynamic model Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed. AAGAAGGTTTTAGTTTAGCC Express gene with  CACCGTAGGTACCTGAAGAA GAAGGTTTTAGTTTAGCCCA 70% capacity when it  CCGTAGGTACCTGAAG  is hot, Thanks! 
  3. 3. Why understanding transcriptional regulation is important? •  Insight in the biology of gene pathways. •  Search for regulatory regions with specific function. •  “Re-programming” of genes has therapeutic potential. A transcription gene promoter DNA Broken regulatory  Design and insert a new  element  regulatory element 
  4. 4. What do we need to know  for  building  a  model  able  to translate the regulatory  message ? 
  5. 5. Background : Enhancer •  Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions. transcription gene promoter enhancer regions
  6. 6. Background: Enhancer •  Transcription is regulated by the binding of activator and repressor TFs to an enhancer region. enhancer binding site map Active TF 8 Activators transcription Concentration 2 Repressors
  7. 7. Background: Repression •  Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other. Repressor range binding site map enhancer
  8. 8. On  which  system  would  we  test  the  model’s  abiliJes ? 
  9. 9. Background: Even-skipped gene (eve) Drosophila melanogaster 1 Embryo stained for eve 2 Function representation 3 1 hLp://insects.eugenes.org/  2 Small et al.  3 hLp://bioinform.geneJka.ru 
  10. 10. Background: Regulation of eve MSE MSE eve MSE MSE MSE Late1            3+7                        2            P                       late2                     4+6                    1        5  lacZ  Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  11. 11. Hypothesis TF  Bindin ns  Genome     conce ntraJo g site  map  re, a rchitectu RNA,  n, m ethylaJo …  predicts gene activation
  12. 12. Research Goals •  Optimize Thermodynamic models efficiently. •  Analyze robustness of these models. •  Explore the regulation of a particular gene. •  Examine how the regulatory program evolves. •  Extend current thermodynamic model. Cooperphoto/CORBIS 
  13. 13. Model definition Site occupancy (Hill function) Kt · K(s, t) · [t] p(s, t) = 1 + Kt · K(s, t) · [t] Free parameters TF PARAMS Total activation K Binding affinity W (S, T ) = Ets p(s, ts ) 1 − Ets · p(s , ts ) · d(s, s ) s∈S A s ∈S R E Effectiveness quenching of the activator activator contribution GENERAL PARAMS Transcription rate (Arrhenius function)  R0 Max. transcription  R exp W (S, T ) − G0 iff W < G0 rate 0 R(S, T ) =  R0 otherwise, G0 Energy barrier   ts ts Buena Vista Pictures  s s Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the  Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  
  14. 14. Training the model 200 100 50 0 < [TF ], [TF ], [TF ], [TF ] > 0 20 40 60 80 100 1 2 3 4 TF Binding TF Concentration Thermodynamic Model predicted Adjust model expression and parameters to 150 100 compare it to improve fit 50 target 0 40 50 60 70 80 90
  15. 15. Optimization methods •  Two optimization paradigms –  Simulated Annealing •  LAM schedule (Reinitz et al. 2003) •  Geometric cooling –  Gradient descent •  Three GD variants approximating the objective function, which was not continuously differentiable. •  Judged on accuracy achieved in the given time –  Drosophila MSE2 data with 400 data points and 7 TF (16 free parameters).
  16. 16. Optimization Simulated Annealing Gradient Descent 1.00 20 20 SA LAM 0.99 SA geom 0.99 15 15 RMS error 0.98 RMS error CC CC 10 10 0.97 0.97 SA_geom 5 5 0.96 GD_softmax SA LAM GD_nomax SA geom 0.95 GD_max 0.95 0 0 1 2 5 10 50 200 1 2 1 5 2 105 20 10 50 100 50 200200500 time [minutes] time [minutes] time [minutes] Suggests: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  17. 17. If  gradient  descent  gets  stuck  in  local  minima  all  the  Jme,  how  does  the  opJmizaJon  landscape  look like ? 
  18. 18. Landscape analysis •  Synthetic data based on real MSE2 data –  global minimum and solution (parameter values) are known. –  Measuring distance of the optimization solution to the starting position and the known solution. –  Measuring error reduction at the solution compared to the starting position.
  19. 19. Landscape analysis Experiment Ini$al distance to  Final distance to  Error Red.  solu4on (mean)  solu4on  (mean)  (mean)  1% perturbed  3.4·10−4 2.8·10−4 88%  random  0.1  0.11  97%  Conclusion: many local minima. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal  regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  
  20. 20. Does the model over-fit ? •  Cross-validation (5-fold) Experiment Mean RMS error  Mean CC   (SE)   (SE)  training  13.39 (0.004)  0.92  (4.8 · 10−5 ) tesJng  14.04 (0.005)  0.91  (5.7 · 10−5 ) •  Redundancy reduction –  Not enough data to begin with
  21. 21. Summary: Optimization & Analysis •  The objective function is ill-posed. –  It has a plethora of local minima. –  It might have many global minima. •  Hence SA is the method of choice. •  There might be a tendency to over-fit the data. hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html  hLp://images.nciku.com/ 
  22. 22. Research Goals •  Optimize Thermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  23. 23. Regulation and Evolution of eve •  Mechanism for regulating eve is conserved: –  Stripe 2 elements from other Drosophila species activate eve in D. mel. correctly. –  Despite the substantial difference in the regulatory DNA sequence. hLp://www.bio.ilstu.edu/Edwards/  Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila  despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  
  24. 24. Evaluate Evolution of MSE2 •  Test if the model can identify the MSE2 in these other species. •  Test if the model correctly predicts the transcriptional output of the homologous MSE2s.
  25. 25. Searching for MSE2 •  Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species MSE2 promoter eve Other species 150 100 50 0 40 50 60 70 80 90 150 RMS error 100 50 0 40 50 60 70 80 90 < 23 27 43 … 13 … > Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  26. 26. Searching for MSE2: Result •  Correctly identified the MSE2 in 6/8 species 40 D. melanogaster 30 20 RMS error  10 40 D.pseudoobscura 30 20 10 rms error Genomic locaJon  40 Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  30 rimshawi and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    20
  27. 27. Predicting the output in other species •  Apply a model trained on D. mel. MSE2 to the MSE2s in other species D. melanogaster  15 150 Target 10 D. melanogaster Log odds score (bits) relative RNA concentration 5 D. pseudoobscura 0 D. ananassae !5 100 D. mojavensis !10 !15 0 500 1000 1500 D. mojavensis  rel. genomic position 50 bicoid kruppel giant hunchback knirps caudal tailless 0 40 50 60 70 80 90 A!P position (%) Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules  and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   
  28. 28. Summary Application •  Model fits the data qualitatively. •  Predictions are biologically meaningful. •  However, there is room for improvement.
  29. 29. Research Goals •  Optimize Thermodynamic models efficiently •  Analyze robustness of these models •  Explore the regulation of a particular gene •  Examine how the regulatory program evolves •  Extend current thermodynamic model Cooperphoto/CORBIS 
  30. 30. One role fits them all? •  Dual function is proposed for some of the regulatory TFs. –  E.g. TF Hunchback (Hb) might be an activator when regulating stripe2 and repressor for stripe3. Late1            3+7                        2            P                       late2                     4+6                    1        5  Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the  Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  31. 31. Determine the regulatory role of TFs •  Different data set: 44 CRMs important for D. mel. development but same set of TFs. •  Determine the best role for each TF in each of the CRMs –  Brute Force: train a model for all TF role-combinations on each of the 44 CRMs. –  Record the correlation achieved. –  Identify TFs that have dual-function. Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes  Drosophila segmentaJon. Nature, 2008, 451, 535‐540  Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  for publicaJon, 2009 
  32. 32. TFs with dual role Bcd  Cad  Hb  Tll  Gt  Kr  Kni  TorRE  Det. roles  s  +  s  ‐  s  s  ‐  s  Literature  +  +  s  ‐  (s)  s  ‐  NA  (consensus)  “s”: dual-functioning, “+”: activator, “-”: repressor. •  E.g. Hb –  Activator for 17 CRMs –  Repressor for 27 CRMs Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster.  PLoS Comput Biol, 2006, 2, e51   Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of  Drosophila. PLoS Biol, 2004, 2, E271  
  33. 33. Improvement with dual function kr_CD1_ru hb_anterior_actv 1.0 1.0 1.0 target previous roles HbDual Experiment number of  mean CC   KrDual free  (SE)  0.8 0.8 0.8 HbKrDual best parameters  Previous  18  0.27 (0.008)  0.6 0.6 0.6 mRNA mRNA mRNA roles  HbDual  19  0.35 (0.009)  0.4 0.4 0.4 KrDual  19  0.37 (0.007)  0.2 0.2 0.2 HbKrDual  20  0.38 (0.007)  0.0 0.0 0.0 0 20 40 60 80 100 0 20 40 60 80 100 AP AP Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by  run_stripe5 SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed  eve_37ext_ru for publicaJon, 2009  .0 .0 .0
  34. 34. Marker motifs for dual function •  Running MEME on the protein sequence of dual- functioning TFs to find short motifs (<6aa) present in all of them. CI KE 4 4 Q 3 3 K D ID bits bits 2 G 2 1 0 L E Y Q 1 0 L V 1 2 3 4 1 2 3 4 MEME (no SSC) 15.07.09 12:07 MEME (no SSC) 15.07.09 12:07 SUMOyla(on  mo(f 
  35. 35. SUMOylation •  Small Ubiquitin-related Modifier a SUMO protease SU small protein covalently attached ATP to target-proteins. SU SUMO •  Involved in many pathways/ SU pathway mechanisms E1 activating enzyme –  Compartmentisation target protein + E3 ligasis –  Transcriptional regulation SU •  Can reverse the function of a TF e.g. E2 conjugating enzyme Ikaros (the human homologue of Kr) •  SUMO (Smt3) is present in D. mel during development Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in  developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009,  in submission   del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005,  25, 2688‐2697   
  36. 36. Conclusion •  Thermodynamic models can be best optimized using SA but over-fitting is an issue to keep in mind. Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646   •  Non-the-less, they are applicable for –  examining the mechanisms of transcriptional regulation, –  explore the evolution of a particular regulatory mechanism Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220    •  Model prediction improves when dual-function is allowed. Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila  melanogaster submiLed for publicaJon, 2009  –  SUMOylation seems to be a good candidate for the biological mechanism of role-change. Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster  NeurocompuJng, 2009, in submission  
  37. 37. Acknowledgments •  IMB •  Funding –  Timothy Bailey (supervisor) –  Institute for Molecular –  Mikael Bodén (supervisor) Bioscience, The University of –  Sean Grimmond (thesis committee) Queensland –  Nick Hamilton (thesis committee) –  Australian Research Council –  Fabian Buske Centre of Excellence in –  Stefan Maetschke Bioinformatics –  National Institutes of Health •  Stony Brook University –  John Reinitz –  UQ International Research Tuition Award Framework for modeling, visualizing, and predicJng the  regulaJon of the transcripJon rate of a target gene  www.bioinforma(cs.org.au/stream 
  38. 38. www.bioinforma(cs.org.au/stream  •  Framework for modeling, visualizing, and predicting the regulation of the transcription rate of a target gene. •  Publicly available •  Modular: New functions can be plugged in Many functions Command line Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for  transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545. 

×