SlideShare a Scribd company logo
IICB Course work, 8th Dec 2012
Topics to be covered
 Primer designing
 Restriction mapping
 Gene Prediction
ATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGACG
AGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCAT
AGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGGA
CGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACGC
ATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAGA
AATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGAC
GAGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCA
TAGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGG
ACGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACG
CATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAG
AAATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGA
CGAGACGCAGAATAGAGGACGAGCAATGATGATAGAA
ATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGACG
AGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCAT
AGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGGA
CGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACGC
ATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAGA
AATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGAC
GAGACGCAGAATAGAGGACGAGCAATGATGATAGAA
 Oligo Analyzer:
 http://www.idtdna.com/analyzer/Applications/OligoAnalyzer/
 http://www.idtdna.com/Analyzer/Applications/Instructions/Default.as
 px?AnalyzerDefinitions=true
Primer Designing
 No non-specific binding
 Melting temperature
 Should not be forming dimers with itself or other
 primers.
Some thoughts
  1. primers should be 17-28 bases in length;
  2. base composition should be 50-60% (G+C);
  3. primers should end (3') in a G or C, or CG or GC: this prevents
  "breathing" of ends and increases efficiency of priming;
  4. Tms between 55-80oC are preferred;
  5. 3'-ends of primers should not be complementary (ie. base
  pair), as otherwise primer dimers will be synthesised
  preferentially to any other product;
  6. primer self-complementarity (ability to form 2o structures
  such as hairpins) should be avoided;
  7. runs of three or more Cs or Gs at the 3'-ends of primers may
  promote mispriming at G or C-rich sequences (because of
  stability of annealing), and should be avoided.

Adapted from: Innis and Gelfand,1991
Reference:
http://bioweb.uwlax.edu/genweb/molecular/seq_anal/primer_design/primer_design.ht
Restriction Analysis
 Found in Bacteria and archea.
 4 types
    Type -1: cleavage remote to recognition site (methylase
     activity)
    Type-2: cleavage within a specific distance
    Type-3: Cleavage within a short distance
    Type-4: Cleaves modified DNA (methylated)




    Ref: http://insilico.ehu.es/restriction/long_seq/
    http://molbiol-tools.ca/Restriction_endonuclease.htm
Gene Prediction
   Patterns
   Frame Consistency
   Dicodon frequencies
   PSSMs
   Coding Potential and Fickett’s statistics
   Fusion of Information
   Sensitivity and Specificity
   Prediction programs
   Known problems
Genes are all about Patterns – real
life example
Gene Prediction Methods
Common sets of rules
 Homology
 Ab initio methods
   Compositional information
   Signal information
Pattern Recognition in Gene
Finding
    atgttggacagactggacgcaagacgtgtggatgaactcgttttggagctgaac
     aaggctctatacgtacttaatcaagcggggcgtttgtggagcgagt
     tacttcacaaaaagctagccaatttgggttcaatgcagtgcctgaccgacatggg
     tatgtattagtaacgtttggaagaagaaactgttgtggttggtgt
     ttatgcagacaatctacaggtgactgcaacgaattcaactctcgtggacagttttt
     tcgttgatttacaggacctctcggtaaaggactatgaagaggtg
     acaaaattcttggggatgcgcatttcttatgcgcctgaaaatgggtatgattatat
     atcgagaagtgacaacccgggaaatgataaaggataa


    atggagaggatgctggagacggtcaagacgaccatcacccctgcgcaggcaatgaag
    ctgtttactgcacccaaagaacctcaagcgaacctggcccgag
    cacttcatgtacttggtggccatctcggaggcctgcggtggtacttagtcctgaataacg
    tcgtgccgtacgcgtccgcggatctacgaacggtcctgat
    agccaaagtggacggcacgcgtgtcgactacctacagcaagctgaggaactggcgca
    tttcgcgcaatcctgggagcttgaagcgcgcacgaagaacatt




    We need to study the basic structures of genes first ….!
Gene Structure – Common sets of
rules
• Generally true: all long (> 300 bp) orfs in prokaryotic genomes
  encode genes

But this may not necessarily be true for eukaryotic genomes



• Eukaryotic introns begin with GT and end with AG(donor and
  acceptor sites)
   – CT(A/G)A(C/T) 20-50 bases upstream of acceptor site.
Gene Structure

   Each coding region (exon or whole gene) has a fixed
    translation frame
   A coding region always sits inside an ORF of same
    reading frame
   All exons of a gene are on the same strand.
   Neighboring exons of a gene could have different
    reading frames .
   Exons need to be Frame consistent!

     GATGGGACGACAGATAAGGTGATAGATGGTAGGCAGCAG
      0  3 6 9 12                   01
Gene Structure – reading frame consistency
                Neighboring exons of a gene should be frame-consistent
            Frame 0                           Frame 2
             GATGGGACGACAGATAGGTGATTAAGATGGTAGGCCGAGTGGTC
                  1                           16                              33
                GATGGGACGACAGATAGGTGATTAAGATGGTAGGCCGAGTGGTC
                 1                         16                               40
                Frame 0
                                                                             Frame 2
                   Exon1 (1,16) -> Frame = a = 0 ; i = 1 and j = 16
                   Case1: Exon2 (33,100): Frame = b = 2; m = 33 and n = 100
                   Case2: Exon2 (40,100): Frame = b = 2; m = 40 and n =100

                      exon1 (i, j) in frame a and exon2 (m, n) in frame b are consistent if

                                      b = (m - j - 1 + a) mod 3

12/7/2012
                                 …31,34,37,40,43,46,49,52…
Codon Frequencies
    Coding sequences are translated into protein sequences
    We found the following – the dimer frequency in protein sequences is
     NOT evenly distributed

    Organism specific!!!!!!!!!!!


           The average frequency is ¼% (1/20
           * 1/20 = 1/400 = ¼%)
           Some amino acids prefer to be next
           to each other
           Some other amino acids prefer to
           be not next to each other
ALA ARG ASN AS CYS GLU GLN GLY HIS                   ILE   LEU LYS MET PHE PRO SER THR TRP TYR VAL
                P
A   .99 .5  .27 .45 .13 .52 .34 .51 .19                  .38   .83   .46   .2    .31   .37   .73   .56   .09   .18   .65

R   .5    .5    .2    .3   .1    .4    .3    .4    .18   .25   .63   .37   .14   .22   .26   .54   .34   .08   .17   .46

N   .31   .19   .11   .2   .05   .23   .23   .26   .07   .13   .27   .16   .07   .11   .15   .24   .16   .04   .08   .27


D   .54   .32   .17   .47 .08    .51   .19   .42   .13   .25   .48   .26   .12   .20   .24   .40   .29   .06   .14   .5


C   .14   .11   .05   .09 .04    .09   .06   .13   .04   .08   .14   .08   .03   .06   .07   .14   .10   .02   .04   .13

E   .57   .43   .22   .42 .09    .59   .30   .33   .16   .28   .64   .40   .17   .20   .21   .44   .36   .06   .16   .44

Q   .34   .31   .11   .20 .06    .27   .29   .20   .12   .15   .45   .21   .10   .13   .17   .29   .22   .05   .10   .29

G   .50   .39   .22   .37 .11    .37   .21   .50   .16   .28   .50   .33   .14   .23   .21   .54   .35   .07   .17   .46

H   .21   .17   .07   .14 .04    .16   .10   .17   .08   .09   .22   .10   .05   .09   .12   .17   .11   .03   .06   .21

I   .37   .25   .13   .27 .08    .27   .15   .27   .09   .15   .34   .22   .08   .14   .18   .29   .21   .04   .11   .32

L   .79   .65   .30   .53 .16    .62   .45   .50   .25   .31   .97   .47   .19   .32   .44   .71   .49   .10   .22   .67

K   .43   .41   .19   .26 .08    .35   .24   .26   .14   .20   .49   .41   .13   .17   .20   .37   .32   .07   .15   .33

M   .23   .17   .09   .17 .04    .19   .12   .14   .06   .10   .25   .15   .07   .08   .11   .20   .17   .03   .06   .17

F   .30   .20   .11   .22 .07    .22   .13   .26   .08   .14   .33   .15   .08   .14   .14   .27   .18   .04   .10   .28

P   .35   .28   .13   .23 .05    .27   .16   .24   .10   .17   .39   .19   .10   .15   .26   .42   .29   .04   .09   .33

S   .68   .51   .26   .46 .14    .43   .26   .55   .17   .33   .68   .38   .18   .28   .36   .93   .55   .09   .17   .56

T   .51   .36   .19   .29 .10    .31   .21   .37   .12   .25   .52   .32   .13   .21   .31   .54   .42   .07   .13   .39

W   0.07 .08    .05   .06 .02    .06   .05   .06   .03   .06   .12   .07   .03   .04   .04   .09   .07   .02   .03   .07

Y   .21   .16   .09   .15 .05    .16   .09   .17   .06   .10   .23   .11   .06   .10   .1    .17   .13   .03   .07   .2

V   .69   .42   .24   .46 .13    .48   .31   .42   .18   .29   .72   .37   .16   .27   .33   .55   .42   .07   .18   .61
Dicodon Frequencies
    Believe it or not – the biased (uneven) dimer frequencies are the
     foundation of many gene finding programs!

    Basic idea – if a dimer has lower than average dimer frequency; this
     means that proteins prefer not to have such dimers in its sequence;




      Hence if we see a dicodon encoding this dimer, we may
      want to bet against this dicodon being in a coding
      region!
Dicodon Frequencies - Examples

    Relative frequencies of a di-codon in coding versus non-coding
         frequency of dicodon X (e.g, AAAAAA) in coding region, total number of occurrences of
          X divided by total number of dicocon occurrences
         frequency of dicodon X (e.g, AAAAAA) in noncoding region, total number of
          occurrences of X divided by total number of dicodon occurrences




     In human genome, frequency of dicodon “AAA AAA” is
     ~1% in coding region versus ~5% in non-coding region

         Question: if you see a region with many “AAA AAA”,
         would you guess it is a coding or non-coding region?
Basic idea of gene finding
    Most dicodons show bias towards either coding or non-coding
     regions; only fraction of dicodons is neutral
    Foundation for coding region identification


       Regions consisting of dicodons that mostly tend
         to be in coding regions are probably coding
           regions; otherwise non-coding regions


    Dicodon frequencies are key signal used for coding region
     detection; all gene finding programs use this information
Prediction of Translation Starts Using PSSM
   Certain nucleotides prefer to be in certain position around start
    “ATG” and other nucleotides prefer not to be there 75,100, 75 ATG
     TCTAGAAGATGGCAGTGGCGAAGA
                                                A 0,0,0,100 ,0,
        TCTAGAAAATGACAGTGGCGAAGA
                                                          T 100,0,100,0,0, 25, 0, 0 ATG
        TCTAGAAAATGGCAGTAGCGAAGA
                                                          G 0, 0, 0, 0, 75 ,0, 0, 25 ATG
        TCTACT A AATGATAGTAGCGAAGA             ATG        C 0,100,0,0, 25 ,0, 0, 0 ATG

    A
    C
    G
    T
                        -4     -3    -2   -1         +3   +4       +5         +6

                       CACC ATG GC
                       TCGA ATG TT

   The “biased” nucleotide distribution is information! It is a basis for
    translation start prediction

   Question: which one is more probable to be a translation start?
Prediction of Translation Starts
               Mathematical model: Fi (X): frequency of X (A, C, G, T) in
                position i
               Score a string by log (Fi (X)/0.25)

                CACC ATG GC                             TCGA ATG TT
            log (58/25) + log (49/25) + log (40/25) +    log (6/25) + log (6/25) + log (15/25) + log
            log (50/25) + log (43/25) + log (39/25) =    (15/25) + log (13/25) + log (14/25) =

            0.37 + 0.29 + 0.20 + 0.30 + 0.24 + 0.29      -(0.62 + 0.62 + 0.22 + 0.22 + 0.28 + 0.25)

            = 1.69                                       = -2.54


                                  The model captures our intuition!

        A
        C
        G
        T
12/7/2012
Evaluation of Gene prediction
                         TP     FP           TN            FN       TP
Real


Predicted


       •    Sensitivity = No. of Correct exons/No. of actual exons(Measurement of False
            negative) -> How many are discarded by mistake
                                Sn = TP/TP+FN
       •    Specificity = No. of Correct exons/No. of predicted exons(Measurement of
            False positive) -> How many are included by mistake
                                 Sp=TP/TP+FP

       •    CC = Metric for combining both

                     (TP*TN) – (FN*FP)/sqrt( (TP+FN)*(TN+FP)*(TP+FP)*(TN+FN) )


   12/7/2012
Challenges of Gene finder


•   Alternative splicing
•   Nested/overlapping genes
•   Extremely long/short genes
•   Extremely long introns
•   Non-canonical introns
•   Split start codons
•   UTR introns
•   Non-ATG triplet as the start codon
•   Polycistronic genes
•   Repeats/transposons
Known Gene Finders

    GeneScan
    GeneMarkHMM
    Fgenesh
    GlimmerHMM
    GeneZilla
    SNAP
    PHAT
    AUGUSTUS
    Genie



 Ref: http://bioinf.uni-greifswald.de/augustus/submission

More Related Content

What's hot

RESEARCH METRICS h-INDEX.pptx
RESEARCH METRICS h-INDEX.pptxRESEARCH METRICS h-INDEX.pptx
RESEARCH METRICS h-INDEX.pptx
drpvczback
 
Rmc0001 research publications & ethics - module 3 (5)
Rmc0001  research publications & ethics - module 3 (5)Rmc0001  research publications & ethics - module 3 (5)
Rmc0001 research publications & ethics - module 3 (5)
KURMAIAHA20PHD1086
 
Investigator Role and Responsibilities
Investigator Role and ResponsibilitiesInvestigator Role and Responsibilities
Investigator Role and Responsibilities
HSK College of Pharmacy
 
Cdrh learn module gcp 101 -lacorte
Cdrh learn module gcp 101 -lacorteCdrh learn module gcp 101 -lacorte
Cdrh learn module gcp 101 -lacorte
gy2139
 
Predatory Journals.ppt
Predatory Journals.pptPredatory Journals.ppt
Predatory Journals.ppt
Vinay Kumar
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
Shrihith.A Ananthram
 
Violation of publication ethics.pptx
Violation of publication ethics.pptxViolation of publication ethics.pptx
Violation of publication ethics.pptx
Silpa87
 
Scintillation Counter and Semiconductor Detector
Scintillation Counter and Semiconductor DetectorScintillation Counter and Semiconductor Detector
Unit 5
Unit  5Unit  5
"Research with Animals" lecture
"Research with Animals" lecture"Research with Animals" lecture
"Research with Animals" lecture
Janet Stemwedel
 
NMR of protein
NMR of proteinNMR of protein
NMR of protein
Jiya Ali
 
Metabolic_networks_lecture2 (1).ppt
Metabolic_networks_lecture2 (1).pptMetabolic_networks_lecture2 (1).ppt
Metabolic_networks_lecture2 (1).ppt
DivyaDharshiniUmasha
 
X Ray Diffraction Spectroscopy
X Ray Diffraction SpectroscopyX Ray Diffraction Spectroscopy
X Ray Diffraction Spectroscopy
hephz
 
Xray diffraction
Xray diffractionXray diffraction
Xray diffraction
Abhishek Dharwal
 
Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)
siddharth singh
 
X ray crystallography slideshare
X ray crystallography slideshareX ray crystallography slideshare
X ray crystallography slideshare
Babasaheb Bhimrao Ambedkar University Lucknow
 
Patient Safety in Clinical Trials
Patient Safety in Clinical TrialsPatient Safety in Clinical Trials
Patient Safety in Clinical Trials
Medelis
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 
Target identification
Target identificationTarget identification
Target identification
Sachin Jangra
 
Research and Publication Ethics | UGC
Research and Publication Ethics | UGCResearch and Publication Ethics | UGC
Research and Publication Ethics | UGC
Dilip Barad
 

What's hot (20)

RESEARCH METRICS h-INDEX.pptx
RESEARCH METRICS h-INDEX.pptxRESEARCH METRICS h-INDEX.pptx
RESEARCH METRICS h-INDEX.pptx
 
Rmc0001 research publications & ethics - module 3 (5)
Rmc0001  research publications & ethics - module 3 (5)Rmc0001  research publications & ethics - module 3 (5)
Rmc0001 research publications & ethics - module 3 (5)
 
Investigator Role and Responsibilities
Investigator Role and ResponsibilitiesInvestigator Role and Responsibilities
Investigator Role and Responsibilities
 
Cdrh learn module gcp 101 -lacorte
Cdrh learn module gcp 101 -lacorteCdrh learn module gcp 101 -lacorte
Cdrh learn module gcp 101 -lacorte
 
Predatory Journals.ppt
Predatory Journals.pptPredatory Journals.ppt
Predatory Journals.ppt
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Violation of publication ethics.pptx
Violation of publication ethics.pptxViolation of publication ethics.pptx
Violation of publication ethics.pptx
 
Scintillation Counter and Semiconductor Detector
Scintillation Counter and Semiconductor DetectorScintillation Counter and Semiconductor Detector
Scintillation Counter and Semiconductor Detector
 
Unit 5
Unit  5Unit  5
Unit 5
 
"Research with Animals" lecture
"Research with Animals" lecture"Research with Animals" lecture
"Research with Animals" lecture
 
NMR of protein
NMR of proteinNMR of protein
NMR of protein
 
Metabolic_networks_lecture2 (1).ppt
Metabolic_networks_lecture2 (1).pptMetabolic_networks_lecture2 (1).ppt
Metabolic_networks_lecture2 (1).ppt
 
X Ray Diffraction Spectroscopy
X Ray Diffraction SpectroscopyX Ray Diffraction Spectroscopy
X Ray Diffraction Spectroscopy
 
Xray diffraction
Xray diffractionXray diffraction
Xray diffraction
 
Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)Cadd (Computer-Aided Drug Designing)
Cadd (Computer-Aided Drug Designing)
 
X ray crystallography slideshare
X ray crystallography slideshareX ray crystallography slideshare
X ray crystallography slideshare
 
Patient Safety in Clinical Trials
Patient Safety in Clinical TrialsPatient Safety in Clinical Trials
Patient Safety in Clinical Trials
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Target identification
Target identificationTarget identification
Target identification
 
Research and Publication Ethics | UGC
Research and Publication Ethics | UGCResearch and Publication Ethics | UGC
Research and Publication Ethics | UGC
 

Viewers also liked

Chemical synthesis of DNA By Prabhu Thirusangu
Chemical synthesis of DNA By Prabhu ThirusanguChemical synthesis of DNA By Prabhu Thirusangu
Chemical synthesis of DNA By Prabhu Thirusangu
Prabhu Thirusangu
 
Molecular Marker Techniques
Molecular Marker TechniquesMolecular Marker Techniques
Molecular Marker Techniques
RESEARCH ASSOCIATE - SKUAST
 
RAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSISRAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSIS
prateek kumar
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
qadardana kakar
 
RFLP
RFLP RFLP
VNTR and RFLP
VNTR and RFLPVNTR and RFLP
VNTR and RFLP
Ulaa Iman
 
Southern northern and western blotting
Southern northern and western blottingSouthern northern and western blotting
Southern northern and western blotting
research
 
RFLP & RAPD
RFLP & RAPDRFLP & RAPD
RFLP & RAPD
Arunima Sur
 
Restriction mapping of bacterial dna
Restriction mapping of bacterial dnaRestriction mapping of bacterial dna
Restriction mapping of bacterial dna
Atai Rabby
 
Rapd ppt
Rapd pptRapd ppt
Rapd ppt
Preetha Singha
 

Viewers also liked (10)

Chemical synthesis of DNA By Prabhu Thirusangu
Chemical synthesis of DNA By Prabhu ThirusanguChemical synthesis of DNA By Prabhu Thirusangu
Chemical synthesis of DNA By Prabhu Thirusangu
 
Molecular Marker Techniques
Molecular Marker TechniquesMolecular Marker Techniques
Molecular Marker Techniques
 
RAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSISRAPD, AFLP AND RFLP ANALYSIS
RAPD, AFLP AND RFLP ANALYSIS
 
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing
 
RFLP
RFLP RFLP
RFLP
 
VNTR and RFLP
VNTR and RFLPVNTR and RFLP
VNTR and RFLP
 
Southern northern and western blotting
Southern northern and western blottingSouthern northern and western blotting
Southern northern and western blotting
 
RFLP & RAPD
RFLP & RAPDRFLP & RAPD
RFLP & RAPD
 
Restriction mapping of bacterial dna
Restriction mapping of bacterial dnaRestriction mapping of bacterial dna
Restriction mapping of bacterial dna
 
Rapd ppt
Rapd pptRapd ppt
Rapd ppt
 

Similar to Primer designgeneprediction

6th July 2012
6th July 20126th July 2012
6th July 2012
Sucheta Tripathy
 
10th nov2010
10th nov201010th nov2010
10th nov2010
Sucheta Tripathy
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
Sucheta Tripathy
 
Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2
Dan Gaston
 
9789740329152
97897403291529789740329152
9789740329152
CUPress
 
DNA Microarray
DNA MicroarrayDNA Microarray
2008 2010 он өсөлт бууралт new.
2008 2010 он өсөлт бууралт new.2008 2010 он өсөлт бууралт new.
2008 2010 он өсөлт бууралт new.
Mtuya.hsum
 
UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day Presentation
SDavis7
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
USC
 
Limites de control para gráficos xr xs
Limites de control para gráficos xr xsLimites de control para gráficos xr xs
Limites de control para gráficos xr xs
Mennys-SPC-UTT
 
Biology micropuzzle
Biology micropuzzleBiology micropuzzle
Biology micropuzzle
M, Michelle Jeannite
 
Automatic Drusen Classification
Automatic Drusen ClassificationAutomatic Drusen Classification
Automatic Drusen Classification
Darshan Shah
 
Jsu thn 4
Jsu thn 4Jsu thn 4
Jsu thn 4
Ahmad Shah
 
Jsu thn 4
Jsu thn 4Jsu thn 4
Jsu thn 4
Ahmad Shah
 
International Association for Food Protection Presentation Poster
International Association for Food Protection Presentation PosterInternational Association for Food Protection Presentation Poster
International Association for Food Protection Presentation Poster
Benjamin Alonzo
 
Mathsclub.pptx
Mathsclub.pptxMathsclub.pptx
Mathsclub.pptx
MuraliKrishnaShah1
 
PRISM Semantic Integration Approach
PRISM Semantic Integration ApproachPRISM Semantic Integration Approach
PRISM Semantic Integration Approach
imgcommcall
 
Poster at aala2016
Poster at aala2016 Poster at aala2016
Poster at aala2016
TakaKumazawa
 
3D Ear Anthropometry for Earphone Design
3D Ear Anthropometry for Earphone Design3D Ear Anthropometry for Earphone Design
3D Ear Anthropometry for Earphone Design
Handong Global University
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
Amish Soni
 

Similar to Primer designgeneprediction (20)

6th July 2012
6th July 20126th July 2012
6th July 2012
 
10th nov2010
10th nov201010th nov2010
10th nov2010
 
Primer designgeneprediction
Primer designgenepredictionPrimer designgeneprediction
Primer designgeneprediction
 
Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2
 
9789740329152
97897403291529789740329152
9789740329152
 
DNA Microarray
DNA MicroarrayDNA Microarray
DNA Microarray
 
2008 2010 он өсөлт бууралт new.
2008 2010 он өсөлт бууралт new.2008 2010 он өсөлт бууралт new.
2008 2010 он өсөлт бууралт new.
 
UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day Presentation
 
Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011Analysis update for GENEVA meeting 2011
Analysis update for GENEVA meeting 2011
 
Limites de control para gráficos xr xs
Limites de control para gráficos xr xsLimites de control para gráficos xr xs
Limites de control para gráficos xr xs
 
Biology micropuzzle
Biology micropuzzleBiology micropuzzle
Biology micropuzzle
 
Automatic Drusen Classification
Automatic Drusen ClassificationAutomatic Drusen Classification
Automatic Drusen Classification
 
Jsu thn 4
Jsu thn 4Jsu thn 4
Jsu thn 4
 
Jsu thn 4
Jsu thn 4Jsu thn 4
Jsu thn 4
 
International Association for Food Protection Presentation Poster
International Association for Food Protection Presentation PosterInternational Association for Food Protection Presentation Poster
International Association for Food Protection Presentation Poster
 
Mathsclub.pptx
Mathsclub.pptxMathsclub.pptx
Mathsclub.pptx
 
PRISM Semantic Integration Approach
PRISM Semantic Integration ApproachPRISM Semantic Integration Approach
PRISM Semantic Integration Approach
 
Poster at aala2016
Poster at aala2016 Poster at aala2016
Poster at aala2016
 
3D Ear Anthropometry for Earphone Design
3D Ear Anthropometry for Earphone Design3D Ear Anthropometry for Earphone Design
3D Ear Anthropometry for Earphone Design
 
93 crit valuetables_4th
93 crit valuetables_4th93 crit valuetables_4th
93 crit valuetables_4th
 

More from Sucheta Tripathy

Gal
GalGal
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
Sucheta Tripathy
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
Sucheta Tripathy
 
Databases ii
Databases iiDatabases ii
Databases ii
Sucheta Tripathy
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
Sucheta Tripathy
 
Stat2013
Stat2013Stat2013
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
Sucheta Tripathy
 
Stat2013
Stat2013Stat2013
Presentation2013
Presentation2013Presentation2013
Presentation2013
Sucheta Tripathy
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
Sucheta Tripathy
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
Sucheta Tripathy
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
Sucheta Tripathy
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
Sucheta Tripathy
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
Sucheta Tripathy
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
Sucheta Tripathy
 
Biological databases
Biological databasesBiological databases
Biological databases
Sucheta Tripathy
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
Sucheta Tripathy
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
Sucheta Tripathy
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
Sucheta Tripathy
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
Sucheta Tripathy
 

More from Sucheta Tripathy (20)

Gal
GalGal
Gal
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Motif andpatterndatabase
Motif andpatterndatabaseMotif andpatterndatabase
Motif andpatterndatabase
 
Databases ii
Databases iiDatabases ii
Databases ii
 
Snps and microarray
Snps and microarraySnps and microarray
Snps and microarray
 
Stat2013
Stat2013Stat2013
Stat2013
 
26 nov2013seminar
26 nov2013seminar26 nov2013seminar
26 nov2013seminar
 
Stat2013
Stat2013Stat2013
Stat2013
 
Presentation2013
Presentation2013Presentation2013
Presentation2013
 
Lecture7,8
Lecture7,8Lecture7,8
Lecture7,8
 
Lecture5,6
Lecture5,6Lecture5,6
Lecture5,6
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 
Tyler presentation
Tyler presentationTyler presentation
Tyler presentation
 

Recently uploaded

REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 

Recently uploaded (20)

REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 

Primer designgeneprediction

  • 1. IICB Course work, 8th Dec 2012
  • 2. Topics to be covered  Primer designing  Restriction mapping  Gene Prediction
  • 3. ATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGACG AGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCAT AGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGGA CGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACGC ATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAGA AATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGAC GAGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCA TAGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGG ACGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACG CATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAG AAATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGA CGAGACGCAGAATAGAGGACGAGCAATGATGATAGAA ATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGACG AGACGCAGAATAGAGGACGAGCAATGATGATAGAAATGACGCATAGCAGCAT AGACGCATAGACGACGCATAGACGACATAGAGACGAGACGCAGAATAGAGGA CGAGCAATGATGATAGAAATGACGCATAGCAGCATAGACGCATAGACGACGC ATAGACGACATAGAGACGAGACGCAGAATAGAGGACGAGCAATGATGATAGA AATGACGCATAGCAGCATAGACGCATAGACGACGCATAGACGACATAGAGAC GAGACGCAGAATAGAGGACGAGCAATGATGATAGAA Oligo Analyzer: http://www.idtdna.com/analyzer/Applications/OligoAnalyzer/ http://www.idtdna.com/Analyzer/Applications/Instructions/Default.as px?AnalyzerDefinitions=true
  • 4. Primer Designing  No non-specific binding  Melting temperature  Should not be forming dimers with itself or other primers.
  • 5. Some thoughts 1. primers should be 17-28 bases in length; 2. base composition should be 50-60% (G+C); 3. primers should end (3') in a G or C, or CG or GC: this prevents "breathing" of ends and increases efficiency of priming; 4. Tms between 55-80oC are preferred; 5. 3'-ends of primers should not be complementary (ie. base pair), as otherwise primer dimers will be synthesised preferentially to any other product; 6. primer self-complementarity (ability to form 2o structures such as hairpins) should be avoided; 7. runs of three or more Cs or Gs at the 3'-ends of primers may promote mispriming at G or C-rich sequences (because of stability of annealing), and should be avoided. Adapted from: Innis and Gelfand,1991
  • 7. Restriction Analysis  Found in Bacteria and archea.  4 types  Type -1: cleavage remote to recognition site (methylase activity)  Type-2: cleavage within a specific distance  Type-3: Cleavage within a short distance  Type-4: Cleaves modified DNA (methylated) Ref: http://insilico.ehu.es/restriction/long_seq/ http://molbiol-tools.ca/Restriction_endonuclease.htm
  • 8. Gene Prediction  Patterns  Frame Consistency  Dicodon frequencies  PSSMs  Coding Potential and Fickett’s statistics  Fusion of Information  Sensitivity and Specificity  Prediction programs  Known problems
  • 9. Genes are all about Patterns – real life example
  • 10. Gene Prediction Methods Common sets of rules  Homology  Ab initio methods  Compositional information  Signal information
  • 11. Pattern Recognition in Gene Finding  atgttggacagactggacgcaagacgtgtggatgaactcgttttggagctgaac aaggctctatacgtacttaatcaagcggggcgtttgtggagcgagt tacttcacaaaaagctagccaatttgggttcaatgcagtgcctgaccgacatggg tatgtattagtaacgtttggaagaagaaactgttgtggttggtgt ttatgcagacaatctacaggtgactgcaacgaattcaactctcgtggacagttttt tcgttgatttacaggacctctcggtaaaggactatgaagaggtg acaaaattcttggggatgcgcatttcttatgcgcctgaaaatgggtatgattatat atcgagaagtgacaacccgggaaatgataaaggataa atggagaggatgctggagacggtcaagacgaccatcacccctgcgcaggcaatgaag ctgtttactgcacccaaagaacctcaagcgaacctggcccgag cacttcatgtacttggtggccatctcggaggcctgcggtggtacttagtcctgaataacg tcgtgccgtacgcgtccgcggatctacgaacggtcctgat agccaaagtggacggcacgcgtgtcgactacctacagcaagctgaggaactggcgca tttcgcgcaatcctgggagcttgaagcgcgcacgaagaacatt We need to study the basic structures of genes first ….!
  • 12. Gene Structure – Common sets of rules • Generally true: all long (> 300 bp) orfs in prokaryotic genomes encode genes But this may not necessarily be true for eukaryotic genomes • Eukaryotic introns begin with GT and end with AG(donor and acceptor sites) – CT(A/G)A(C/T) 20-50 bases upstream of acceptor site.
  • 13. Gene Structure  Each coding region (exon or whole gene) has a fixed translation frame  A coding region always sits inside an ORF of same reading frame  All exons of a gene are on the same strand.  Neighboring exons of a gene could have different reading frames .  Exons need to be Frame consistent! GATGGGACGACAGATAAGGTGATAGATGGTAGGCAGCAG 0 3 6 9 12 01
  • 14. Gene Structure – reading frame consistency  Neighboring exons of a gene should be frame-consistent Frame 0 Frame 2 GATGGGACGACAGATAGGTGATTAAGATGGTAGGCCGAGTGGTC 1 16 33 GATGGGACGACAGATAGGTGATTAAGATGGTAGGCCGAGTGGTC 1 16 40 Frame 0 Frame 2 Exon1 (1,16) -> Frame = a = 0 ; i = 1 and j = 16 Case1: Exon2 (33,100): Frame = b = 2; m = 33 and n = 100 Case2: Exon2 (40,100): Frame = b = 2; m = 40 and n =100 exon1 (i, j) in frame a and exon2 (m, n) in frame b are consistent if b = (m - j - 1 + a) mod 3 12/7/2012 …31,34,37,40,43,46,49,52…
  • 15. Codon Frequencies  Coding sequences are translated into protein sequences  We found the following – the dimer frequency in protein sequences is NOT evenly distributed  Organism specific!!!!!!!!!!! The average frequency is ¼% (1/20 * 1/20 = 1/400 = ¼%) Some amino acids prefer to be next to each other Some other amino acids prefer to be not next to each other
  • 16. ALA ARG ASN AS CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL P A .99 .5 .27 .45 .13 .52 .34 .51 .19 .38 .83 .46 .2 .31 .37 .73 .56 .09 .18 .65 R .5 .5 .2 .3 .1 .4 .3 .4 .18 .25 .63 .37 .14 .22 .26 .54 .34 .08 .17 .46 N .31 .19 .11 .2 .05 .23 .23 .26 .07 .13 .27 .16 .07 .11 .15 .24 .16 .04 .08 .27 D .54 .32 .17 .47 .08 .51 .19 .42 .13 .25 .48 .26 .12 .20 .24 .40 .29 .06 .14 .5 C .14 .11 .05 .09 .04 .09 .06 .13 .04 .08 .14 .08 .03 .06 .07 .14 .10 .02 .04 .13 E .57 .43 .22 .42 .09 .59 .30 .33 .16 .28 .64 .40 .17 .20 .21 .44 .36 .06 .16 .44 Q .34 .31 .11 .20 .06 .27 .29 .20 .12 .15 .45 .21 .10 .13 .17 .29 .22 .05 .10 .29 G .50 .39 .22 .37 .11 .37 .21 .50 .16 .28 .50 .33 .14 .23 .21 .54 .35 .07 .17 .46 H .21 .17 .07 .14 .04 .16 .10 .17 .08 .09 .22 .10 .05 .09 .12 .17 .11 .03 .06 .21 I .37 .25 .13 .27 .08 .27 .15 .27 .09 .15 .34 .22 .08 .14 .18 .29 .21 .04 .11 .32 L .79 .65 .30 .53 .16 .62 .45 .50 .25 .31 .97 .47 .19 .32 .44 .71 .49 .10 .22 .67 K .43 .41 .19 .26 .08 .35 .24 .26 .14 .20 .49 .41 .13 .17 .20 .37 .32 .07 .15 .33 M .23 .17 .09 .17 .04 .19 .12 .14 .06 .10 .25 .15 .07 .08 .11 .20 .17 .03 .06 .17 F .30 .20 .11 .22 .07 .22 .13 .26 .08 .14 .33 .15 .08 .14 .14 .27 .18 .04 .10 .28 P .35 .28 .13 .23 .05 .27 .16 .24 .10 .17 .39 .19 .10 .15 .26 .42 .29 .04 .09 .33 S .68 .51 .26 .46 .14 .43 .26 .55 .17 .33 .68 .38 .18 .28 .36 .93 .55 .09 .17 .56 T .51 .36 .19 .29 .10 .31 .21 .37 .12 .25 .52 .32 .13 .21 .31 .54 .42 .07 .13 .39 W 0.07 .08 .05 .06 .02 .06 .05 .06 .03 .06 .12 .07 .03 .04 .04 .09 .07 .02 .03 .07 Y .21 .16 .09 .15 .05 .16 .09 .17 .06 .10 .23 .11 .06 .10 .1 .17 .13 .03 .07 .2 V .69 .42 .24 .46 .13 .48 .31 .42 .18 .29 .72 .37 .16 .27 .33 .55 .42 .07 .18 .61
  • 17.
  • 18. Dicodon Frequencies  Believe it or not – the biased (uneven) dimer frequencies are the foundation of many gene finding programs!  Basic idea – if a dimer has lower than average dimer frequency; this means that proteins prefer not to have such dimers in its sequence; Hence if we see a dicodon encoding this dimer, we may want to bet against this dicodon being in a coding region!
  • 19. Dicodon Frequencies - Examples  Relative frequencies of a di-codon in coding versus non-coding  frequency of dicodon X (e.g, AAAAAA) in coding region, total number of occurrences of X divided by total number of dicocon occurrences  frequency of dicodon X (e.g, AAAAAA) in noncoding region, total number of occurrences of X divided by total number of dicodon occurrences In human genome, frequency of dicodon “AAA AAA” is ~1% in coding region versus ~5% in non-coding region Question: if you see a region with many “AAA AAA”, would you guess it is a coding or non-coding region?
  • 20. Basic idea of gene finding  Most dicodons show bias towards either coding or non-coding regions; only fraction of dicodons is neutral  Foundation for coding region identification Regions consisting of dicodons that mostly tend to be in coding regions are probably coding regions; otherwise non-coding regions  Dicodon frequencies are key signal used for coding region detection; all gene finding programs use this information
  • 21. Prediction of Translation Starts Using PSSM  Certain nucleotides prefer to be in certain position around start “ATG” and other nucleotides prefer not to be there 75,100, 75 ATG TCTAGAAGATGGCAGTGGCGAAGA A 0,0,0,100 ,0, TCTAGAAAATGACAGTGGCGAAGA T 100,0,100,0,0, 25, 0, 0 ATG TCTAGAAAATGGCAGTAGCGAAGA G 0, 0, 0, 0, 75 ,0, 0, 25 ATG TCTACT A AATGATAGTAGCGAAGA ATG C 0,100,0,0, 25 ,0, 0, 0 ATG A C G T -4 -3 -2 -1 +3 +4 +5 +6 CACC ATG GC TCGA ATG TT  The “biased” nucleotide distribution is information! It is a basis for translation start prediction  Question: which one is more probable to be a translation start?
  • 22. Prediction of Translation Starts  Mathematical model: Fi (X): frequency of X (A, C, G, T) in position i  Score a string by log (Fi (X)/0.25) CACC ATG GC TCGA ATG TT log (58/25) + log (49/25) + log (40/25) + log (6/25) + log (6/25) + log (15/25) + log log (50/25) + log (43/25) + log (39/25) = (15/25) + log (13/25) + log (14/25) = 0.37 + 0.29 + 0.20 + 0.30 + 0.24 + 0.29 -(0.62 + 0.62 + 0.22 + 0.22 + 0.28 + 0.25) = 1.69 = -2.54 The model captures our intuition! A C G T 12/7/2012
  • 23. Evaluation of Gene prediction TP FP TN FN TP Real Predicted • Sensitivity = No. of Correct exons/No. of actual exons(Measurement of False negative) -> How many are discarded by mistake Sn = TP/TP+FN • Specificity = No. of Correct exons/No. of predicted exons(Measurement of False positive) -> How many are included by mistake Sp=TP/TP+FP • CC = Metric for combining both (TP*TN) – (FN*FP)/sqrt( (TP+FN)*(TN+FP)*(TP+FP)*(TN+FN) ) 12/7/2012
  • 24. Challenges of Gene finder • Alternative splicing • Nested/overlapping genes • Extremely long/short genes • Extremely long introns • Non-canonical introns • Split start codons • UTR introns • Non-ATG triplet as the start codon • Polycistronic genes • Repeats/transposons
  • 25. Known Gene Finders  GeneScan  GeneMarkHMM  Fgenesh  GlimmerHMM  GeneZilla  SNAP  PHAT  AUGUSTUS  Genie Ref: http://bioinf.uni-greifswald.de/augustus/submission