Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
570
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
46
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Gene, Proteins, and Genetic Code
  • 2. Protein Synthesis in a Cell
  • 3. Protein and Amino Acids
  • 4. Protein
  • 5. Protein GOT Ecoli
  • 6. A protein sequence
    • >gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …
    • MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
    • IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
    • EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
    • SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
    • SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
    • VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
    • FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
    • TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
    • VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST
    A protein sequence may have a few hundreds to several thousands amino acids.
  • 7. Protein synthesis
  • 8. Genetic code . . A T T C A C A G T G G A . . I H S G
  • 9. Notes on translation
    • Three Reading frames
    • Third base not important
    • 5’ -> 3’
    • Start and end codon
      • Open Reading Frame (ORF)
      • Each gene is an ORF, but not all ORF are genes.
  • 10. The Central Dogma of Molecular Biology DNA RNA Protein transcript translation replication genotype phenotype
  • 11. Exception – retroviruses DNA RNA Protein transcript translation replication genotype phenotype
  • 12. Biology Phenotype Protein DNA (Genotype)
  • 13. Genes
    • One gene encodes one protein (or sometimes RNA).
    • Like a program, it starts with start codon (e.g. ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene.
    • Genes are dense in prokaryotes and sparse in eukaryotes.
    • In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.
  • 14. Gene related diseases
    • Hemophilia: on X chromosome.
    • Sickle-Cell Anemia: single nucleotide mutation in the first exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes)
    • BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer)
    • Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease.
    • P53 gene: chr. 17p, tumor suppressor protein.
  • 15. Genetic Test
    • Example: http://www.myriad.com/index.php
    • Cons and Pros:
      • Can possibly avoid/early diagnose the disease.
      • Can make you unhappier
      • Can help insurance company discriminate the defected gene carriers
      • ……
  • 16. Possible ways of gene test
    • First PCR the gene, then
      • Sequencing it
      • Measure the length
      • Restriction enzyme
    • Or
      • PCR primer at the mutation site.
  • 17.  
  • 18. Gene Prediction and Annotation Prokaryotes
    • Start/stop codon (ORF)
    • Promoters
    • Content
    • Sequence similarity
  • 19.  
  • 20. Start Codon May miss short genes. Do not know which start codon to use. Overlapping ORF at different reading frames.
  • 21. Promoters
    • <-- upstream downstream -->
    • 5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3‘
    • -35 -10 Gene to be transcribed
    -10: T A T A A T 77% 76% 60% 61% 56% 82% -35: T T G A C A 69% 79% 61% 56% 54% 54% Pribnow box In prokaryotes , the promoter consists of two short sequences at -10 and -35 position upstream of the gene, that is, prior to the gene in the direction of transcription. The sequence at -10 is called the Pribnow box and usually consists of the six nucleotides TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes. The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence allows a very high transcription rate. These rules are only approximately correct.
  • 22. Scoring a 6-mer as Pribnow box
    • Computers deal with exact formulae but not English description.
    • We need a “score function” to measure the likelihood that a 6-mer is a pribnow box
  • 23. An exemplary function for pribnow box fitness evaluation log()
  • 24. Content I – codon bias
    • A codon XYZ occurs with different freqencies in coding regions and non-coding regions
      • different amino acids have different freq.
      • Diff. codons for the same amino acid have diff. freq.
      • In non-coding regions approx. p(X)*p(Y)*p(Z)
  • 25. http://www.kazusa.or.jp/codon/
  • 26. Codon bias
    • First use many known genes of the organism or similar organisms to train codon frequency table.
      • Each codon c i has f(c i ).
    • Second compute the background frequency of each base bf(X) for X=A,C,G,T
    • The “significance” of a codon c=XYZ is then
      • – log( f(c) / (bf(X)*bf(Y)*bf(Z))).
    • High average significance in a region is an indication of gene.
  • 27.  
  • 28. Content II - Hidden Markov Model (HMM)
  • 29. Eukaryotes
    • Basic idea similar to Prokaryotes
    • Difference:
  • 30. DNA-specific transcription factors
    • These are the basic of gene-regulatory network
      • Another hot area in Bioinformatics
  • 31. Splicing
    • Consensus sequences have been identified as necessary but not sufficient for splicing . In vertebrates, these sequences are (the slash identifies the exon-intron or intron-exon junction):
      • C(orA)AG/GTA(orG)AGT &quot;donor&quot; splice site
      • T(orC)nNC(orT)AG/G &quot;acceptor&quot; splice site.
      • A third sequence, which in yeast is TACTAAC , is necessary within the intron sequence.
    These rules are only approximately correct.
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37. Gene Prediction Software
    • Try Gene Scan at http://genes.mit.edu/GENSCAN.html by using the sequence at
      • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=3253144
    • Did Gene Scan work well?