Your SlideShare is downloading. ×
Gene finding
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Gene finding

592
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
592
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
50
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Gene, Proteins, and Genetic Code
  • 2. Protein Synthesis in a Cell
  • 3. Protein and Amino Acids
  • 4. Protein
  • 5. Protein GOT Ecoli
  • 6. A protein sequence
    • >gi|7228451|dbj|BAA92411.1| EST AU055734(S20025) corresponds to a region …
    • MCSYIRYDTPKLFTHVTKTPPKNQVSNSINDVGSRRATDRSVASCSSEKSVGTMSVKNASSISFEDIEKSISNWKIPKVN
    • IKEIYHVDTDIHKVLTLNLQTSGYELELGSENISVTYRVYYKAMTTLAPCAKHYTPKGLTTLLQTNPNNRCTTPKTLKWD
    • EITLPEKWVLSQAVEPKSMDQSEVESLIETPDGDVEITFASKQKAFLQSRPSVSLDSRPRTKPQNVVYATYEDNSDEPSI
    • SDFDINVIELDVGFVIAIEEDEFEIDKDLLKKELRLQKNRPKMKRYFERVDEPFRLKIRELWHKEMREQRKNIFFFDWYE
    • SSQVRHFEEFFKGKNMMKKEQKSEAEDLTVIKKVSTEWETTSGNKSSSSQSVSPMFVPTIDPNIKLGKQKAFGPAISEEL
    • VSELALKLNNLKVNKNINEISDNEKYDMVNKIFKPSTLTSTTRNYYPRPTYADLQFEEMPQIQNMTYYNGKEIVEWNLDG
    • FTEYQIFTLCHQMIMYANACIANGNKEREAANMIVIGFSGQLKGWWNNYLNETQRQEILCAVKRDDQGRPLPDRDGNGNP
    • TELKEGFHMEEKDEPIQEDDQVVGTIQKYTKQKWYAEVMYRFIDGSYFQHITLIDSGADVNCIREDEILDQLVQTKREQV
    • VNSIYLHDNSFPKSMDLPDQKITEKRAKLQDIPHHEERLLDYREKKSRDGQDKLPMEVEQSMATNKNTKILLRAWLLST
    A protein sequence may have a few hundreds to several thousands amino acids.
  • 7. Protein synthesis
  • 8. Genetic code . . A T T C A C A G T G G A . . I H S G
  • 9. Notes on translation
    • Three Reading frames
    • Third base not important
    • 5’ -> 3’
    • Start and end codon
      • Open Reading Frame (ORF)
      • Each gene is an ORF, but not all ORF are genes.
  • 10. The Central Dogma of Molecular Biology DNA RNA Protein transcript translation replication genotype phenotype
  • 11. Exception – retroviruses DNA RNA Protein transcript translation replication genotype phenotype
  • 12. Biology Phenotype Protein DNA (Genotype)
  • 13. Genes
    • One gene encodes one protein (or sometimes RNA).
    • Like a program, it starts with start codon (e.g. ATG), then each three code one amino acid. Then a stop codon (e.g. TGA) signifies end of the gene.
    • Genes are dense in prokaryotes and sparse in eukaryotes.
    • In the middle of a eukaryotic gene, there are introns that are spliced out (as junk) after transcription. Good parts are called exons. This is the task of gene finding.
  • 14. Gene related diseases
    • Hemophilia: on X chromosome.
    • Sickle-Cell Anemia: single nucleotide mutation in the first exon of beta-globin gene (removes a cutting site). 1 in 12 African Americans are carriers. (sick for homozygotes)
    • BRCA1 gene (chr. 17q) – responsible for ½ inherited breast cancer (10% of breast cancer)
    • Fragile X syndrome (mentally retard) – 1 in 1250 males, 2500 females (dominate, but females have partially expressed good gene). FMR-1 gene: tri-nucleotide repeats >200 causes disease.
    • P53 gene: chr. 17p, tumor suppressor protein.
  • 15. Genetic Test
    • Example: http://www.myriad.com/index.php
    • Cons and Pros:
      • Can possibly avoid/early diagnose the disease.
      • Can make you unhappier
      • Can help insurance company discriminate the defected gene carriers
      • ……
  • 16. Possible ways of gene test
    • First PCR the gene, then
      • Sequencing it
      • Measure the length
      • Restriction enzyme
    • Or
      • PCR primer at the mutation site.
  • 17.  
  • 18. Gene Prediction and Annotation Prokaryotes
    • Start/stop codon (ORF)
    • Promoters
    • Content
    • Sequence similarity
  • 19.  
  • 20. Start Codon May miss short genes. Do not know which start codon to use. Overlapping ORF at different reading frames.
  • 21. Promoters
    • <-- upstream downstream -->
    • 5'-XXXXPPPPPPXXXXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3‘
    • -35 -10 Gene to be transcribed
    -10: T A T A A T 77% 76% 60% 61% 56% 82% -35: T T G A C A 69% 79% 61% 56% 54% 54% Pribnow box In prokaryotes , the promoter consists of two short sequences at -10 and -35 position upstream of the gene, that is, prior to the gene in the direction of transcription. The sequence at -10 is called the Pribnow box and usually consists of the six nucleotides TATAAT. The Pribnow box is absolutely essential to start transcription in prokaryotes. The other sequence at -35 usually consists of the six nucleotides TTGACA. Its presence allows a very high transcription rate. These rules are only approximately correct.
  • 22. Scoring a 6-mer as Pribnow box
    • Computers deal with exact formulae but not English description.
    • We need a “score function” to measure the likelihood that a 6-mer is a pribnow box
  • 23. An exemplary function for pribnow box fitness evaluation log()
  • 24. Content I – codon bias
    • A codon XYZ occurs with different freqencies in coding regions and non-coding regions
      • different amino acids have different freq.
      • Diff. codons for the same amino acid have diff. freq.
      • In non-coding regions approx. p(X)*p(Y)*p(Z)
  • 25. http://www.kazusa.or.jp/codon/
  • 26. Codon bias
    • First use many known genes of the organism or similar organisms to train codon frequency table.
      • Each codon c i has f(c i ).
    • Second compute the background frequency of each base bf(X) for X=A,C,G,T
    • The “significance” of a codon c=XYZ is then
      • – log( f(c) / (bf(X)*bf(Y)*bf(Z))).
    • High average significance in a region is an indication of gene.
  • 27.  
  • 28. Content II - Hidden Markov Model (HMM)
  • 29. Eukaryotes
    • Basic idea similar to Prokaryotes
    • Difference:
  • 30. DNA-specific transcription factors
    • These are the basic of gene-regulatory network
      • Another hot area in Bioinformatics
  • 31. Splicing
    • Consensus sequences have been identified as necessary but not sufficient for splicing . In vertebrates, these sequences are (the slash identifies the exon-intron or intron-exon junction):
      • C(orA)AG/GTA(orG)AGT &quot;donor&quot; splice site
      • T(orC)nNC(orT)AG/G &quot;acceptor&quot; splice site.
      • A third sequence, which in yeast is TACTAAC , is necessary within the intron sequence.
    These rules are only approximately correct.
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37. Gene Prediction Software
    • Try Gene Scan at http://genes.mit.edu/GENSCAN.html by using the sequence at
      • http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=3253144
    • Did Gene Scan work well?