Maximizing hidden stop codon on gene design

880 views
783 views

Published on

Khaled Monsoor's presentation on a paper on "Maximizing hidden stop codon on gene design

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
880
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • H = Histidine, I= Isoleucine
  • Maximizing hidden stop codon on gene design

    1. 1. Synthetic gene design with a large number of hidden stops<br />Authors: Phan, V., Saha, S., Pandey, A., Wong, T-Y<br />Published in: Intl. Journal of Data Mining and Bioinformatics<br />Vol. 4, No. 4, 2010<br />Presented by:<br />Khaled Monsoor<br />Bioinformatics Masters Program<br />The University of Memphis<br />Mail: kmonsoor@memphis.edu<br />Date: Nov 05, 2010<br />
    2. 2. Overview<br /><ul><li> What ?
    3. 3. Why ?
    4. 4. How ?
    5. 5. Result ?
    6. 6. Conclusion </li></ul>Synthetic gene design with a large number of hidden stops<br />
    7. 7.
    8. 8. Sleeping is waste of precious time<br />Stay awake<br />Like him …<br />
    9. 9. What the paper talks about ?<br /><ul><li> What are the Hidden stops in genes ?
    10. 10. Can we “redesign” genes to include more Hidden stops ?
    11. 11. How clever computer algorithms can help us ?</li></li></ul><li>Overview<br /><ul><li>What ?
    12. 12. Why ?
    13. 13. How ?
    14. 14. Result ?
    15. 15. Conclusion </li></ul>Synthetic gene design with a large number of hidden stops<br />
    16. 16. Why we need to ?<br />It is now feasible to construct artificial genomes.<br />Researchers at the C. Venter Research Institute created artificially the genome of Mycoplasmagenitalium, completed in 2010<br /> …. To increase efficiency of protein synthesis in ‘designed’ genes ?<br />How to increase efficiency …<br /><ul><li> Hidden stops can protect from frame shifts </li></ul> by terminating them early<br /><ul><li> Without hidden stops, frame shifts can cause </li></ul>very long non-functional proteins<br />
    17. 17. Universal Genetic Code<br /><ul><li>Dictates what a protein is composed of
    18. 18. Has evolved through millions of years
    19. 19. A protein is a sequence of amino acids
    20. 20. Contains 20(twenty) amino acids</li></ul>8<br />
    21. 21. Universal Genetic Code<br />
    22. 22. mRNA:<br />ATGTCCAAACCT<br />Protein:<br />M S LP<br />10<br />Translation<br />
    23. 23. Triplets representing P (Proline)<br />11<br />CCT, CCC, CCA, CCG all represent P (Proline)<br />A mutation in the 3rd positions does not change the amino acid<br />
    24. 24. Deletion/Insertion is dangerous<br />Deletion creates frame shifts, which change entire subsequence content<br />RNA: ….. CAT.CAT.CAT.CAT ….<br />Protein: …HHHH… (chain of Histidine)<br />Deletion of 3rd character (T): CAC.ATC.ATC.AT<br />Protein: HII <br /> ... Totally bizarre something else !!!<br />12<br />
    25. 25. Like them …<br />:-(<br />
    26. 26. Regular Expression for a Protein<br />(start) (codon)k (stop)<br />Start – ATG<br />Stop – TAA, TAG, TGA<br />Codon– any triplet not equal to TAA, TAG, orTGA<br />Example: ATG.ACC.AAT.CGG.TAA<br />14<br />Stop codon (but hidden)<br />
    27. 27. Why a hidden stop is good ?<br />Hidden stops can protect against frame shifts by terminating consequence translation early<br />Without hidden stops, frame shifts can cause very long non-functional proteins, resulting to not only waste of time, amino acid resources (money), ATP (energy) but also produce some deadly toxin <br />Ref: Seligmann and Pollock, DNA and Cell Biology, 2004<br />15<br />
    28. 28. Overview<br /><ul><li>What ?
    29. 29. Why ?
    30. 30. How ?
    31. 31. Result ?
    32. 32. Conclusion </li></ul>Synthetic gene design with a large number of hidden stops<br />
    33. 33. Goal<br /><ul><li>Design genes with maximum hidden stops
    34. 34. Constraints: </li></ul>None, <br />by matching GC content, and <br />by matching codon usage<br />17<br />
    35. 35. Example: protein is MSDSKED<br />18<br />
    36. 36. Hidden Stops<br />Consider this protein is MSDSKED<br />Both sequences encode for this protein:<br />ATG.AGT.GAT.AGT.AAA.GAA.GAC.TAA<br />ATG.TCC.GAT.TCG.AAA.GAA.GAC.TAA<br />Sequence (1) is better! It has 4 hidden stops!<br />19<br />
    37. 37. Algorithm for No Constraint<br />Goal: <br /><ul><li>Given a protein, design a DNA sequence that encodes the protein with the maximum number of hidden stops</li></ul>20<br />
    38. 38. Dynamic Programming approach<br />Idea: <br /> Optimal design of whole sequence is based on optimal design of partial sequences<br />H(i, j) = optimal design up to ith amino acid, Ai, which is coded by its jthcodon<br />21<br />
    39. 39. Optimal Substructure of algorithm<br />This formula can be computed recursively (in linear time, O(n))<br />H(i, j) = maxk { H(i-1, k) + Ikj } <br />Maximizing over all k codons coding the previous amino acid, Ai-1<br /> Ikj = 1 if the kth codon of Ai-1 and jth codon of Ai is a stop codon<br />22<br />
    40. 40. Strategy: Back Translation<br />Protein  DNA <br />This is a 1-to-many mapping<br />Back translation should:<br />Satisfy constraints imposed by host genomes,<br />Serve specific design purpose<br />23<br />
    41. 41. Main algorithm (2 parts)<br />
    42. 42. Constrained by GC Content<br />GC content = number of G & C in sequence<br />GC content relates to the stability of DNA<br />Algorithm’s objectives: <br />maximizenumber of hidden stops, <br />then, matchGC content of host genome<br />25<br />
    43. 43. Algorithm considering GC content Constraint and “Fitting” approach<br />
    44. 44. Constrained by Codon Usage<br />Algorithm:<br /><ul><li>Construct the sequence with maximum number of hidden stops
    45. 45. “Fit” this sequence to the required Codon usage</li></ul>Result:<br /><ul><li> Cannot achieve bothmax hidden stops and match Codon usage
    46. 46. Still “better” than wild-type genes</li></ul>27<br />
    47. 47. For a particular amino acid, triplets are not distributed uniformly<br />28<br />For Leucine, codon CUG is used 51% in E. Coli.<br />
    48. 48. Algorithm considering Codon Usage Constraints<br />
    49. 49. Overview<br /><ul><li>What ?
    50. 50. Why ?
    51. 51. How ?
    52. 52. Result ?
    53. 53. Conclusion </li></ul>Synthetic gene design with a large number of hidden stops<br />
    54. 54. Comparison<br />“Wild type” (genes from NCBI)<br />Random gene (constrained by Codon usage of “wild type”<br />“Optimal” – design with no constraint (max stop codon)<br />Constrained by GC content of wild type<br />Constrained by Codon usage of wild type<br />31<br />
    55. 55. Genes for re-design study<br />.<br />.<br />.<br />
    56. 56. Overall comparison of all approaches<br />Number of hidden stop codon<br />
    57. 57. Overview<br /><ul><li>What ?
    58. 58. Why ?
    59. 59. How ?
    60. 60. Result ?
    61. 61. Conclusion</li></li></ul><li>Conclusion<br />While maintaining GC content & codon usage of wild-types, the algorithms can propose gene s with 1approx 10% more hidden stops<br />Maintaining both the constraints, the shape of distribution graph of ‘wild-type’ and ‘designed’ gene can maintain 98% Pearson correlation<br />
    62. 62. Any question ?<br />As a lagging grad student,<br />I’ll try my best to answer <br />…<br />
    63. 63. Thank you for attending his boring presentation … oh<br />

    ×