Lecture 7:Codon Usage analysis and plotting open reading frames
Dr. Naulikha Kituyi
Department of Biological Sciences
University of Embu
Biochemistry-
2021
Codon Usage
SELF-TEST QUESTIONS
•Explain why some genes may not be fully expressed in
certain host organisms during recombinant DNA
technologies.
• State the relevance of ORF finding in bioinformatics
Codon usage refers to the frequency with which a particular
organism uses the available CODONS in genes. The majority of
AMINOACIDS are coded for by more than one codon (see
GENETIC CODE and there are marked preferences for the use
of the alternative codons amongst different species. For
example, in bacteria CCG is the preferred codon for the amino
acid proline, rather than CCU, CCC or CCA. Codon usage can
affect the efficiency of translation, particularly when cloned
heterologous genes are being expressed. If the cloned gene
contains a large number of unfavoured codons the tRNA
molecules of the host may not effectively recognize them, so
reducing translation and hence the amount of protein
synthesized.
Codon usage bias refers to differences in the
frequency of occurrence
of synonymous codons in coding DNA. A
codon is a series of three nucleotides (a
triplet) that encodes a specific amino
acid residue in a polypeptide chain or for the
termination of translation (stop codons).
Illustration of Codon Usage bias
Although codon usage is not the only
barrier to success in heterologous gene
expression, this tool provides a very
easy way to check the compatibility of
your gene and expression system before
you start In this diagram is an illustration
of Codon usage bias in P. patens as
compared to other selected organisms.
Codon usage differs between all genes
and highly expressed genes. Codons
significantly over represented in highly
expressed genes are colored dark
green/marked by a (+) and significantly
under represented codons are colored
light blue/marked by a (–) (p-adjusted
<0.05, Fisher’s exact test).
Open Reading Frames
An open reading frame is a portion of a DNA molecule that,
when translated into amino acids, contains no stop codons.
The genetic code reads DNA sequences in groups of three
base pairs, which means that a double-stranded DNA
molecule can read in any of six possible reading frames--
three in the forward direction and three in the reverse. A long
open reading frame is likely part of a gene.
Target genes can be identified by searching online databases for long
stretches of DNA that could potentially code for protein
 These sequences – called open reading frames (ORF) – will be
preceded by a start codon and uninterrupted by stop codons
 Open reading frames will typically consist of at least 100 codons
(300 nucleotides)
 Searches can be refined by looking at regions downstream of
known promoter sequences and upstream of termination sites
While open reading frames may predict potential coding regions, they
do not automatically guarantee the presence of a gene
 Some long and uninterrupted sequences DNA may not actually be
translated, whilst other short sequences may code protein
Identifying Open Reading Frames
To identify an open reading frame:
 Locate a sequence corresponding to a start codon in order
to determine the reading frame – this will be ATG (sense
strand)
 Read this sequence in base triplets until a stop codon is
reached (TGA, TAG or TAA)
 The longer the sequence, the more significant the
likelihood that the sequence corresponds to an open reading
frame
Certain bioinformatic programs can automatically identify
potential ORFs when provided with a candidate sequence
 Gene sequences are largely conserved – so if an ORF
sequence is present in multiple genomes, it likely
represents a gene
Identification of an Open Reading Frame
The Open Reading Frame Finder
ORF finder searches for open reading frames (ORFs) in the DNA
sequence you enter. The program returns the range of each ORF,
along with its protein translation. Use ORF finder to search newly
sequenced DNA for potential protein encoding segments, verify
predicted protein using newly developed SMART BLAST or regular
BLASTP.
Examples (click to set values, then click Submit button) :
 NC_011604 Salmonella enterica plasmid pWES-1; genetic code:
11; 'ATG' and alternative initiation codons; minimal ORF length: 300
nt
 NM_000059; genetic code: 1; start codon: 'ATG only'; minimal ORF
length: 150 nt
Top of Form
Enter Query Sequence
Enter accession number, gi, or nucleotide sequence in FASTA format:
From: To:
Choose Search Parameters
Use this link to attempt an illustration
https://www.ncbi.nlm.nih.gov/orffinder/B
ottom of Form

LECTURE 7.pptx

  • 1.
    Lecture 7:Codon Usageanalysis and plotting open reading frames Dr. Naulikha Kituyi Department of Biological Sciences University of Embu Biochemistry- 2021
  • 2.
    Codon Usage SELF-TEST QUESTIONS •Explainwhy some genes may not be fully expressed in certain host organisms during recombinant DNA technologies. • State the relevance of ORF finding in bioinformatics Codon usage refers to the frequency with which a particular organism uses the available CODONS in genes. The majority of AMINOACIDS are coded for by more than one codon (see GENETIC CODE and there are marked preferences for the use of the alternative codons amongst different species. For example, in bacteria CCG is the preferred codon for the amino acid proline, rather than CCU, CCC or CCA. Codon usage can affect the efficiency of translation, particularly when cloned heterologous genes are being expressed. If the cloned gene contains a large number of unfavoured codons the tRNA molecules of the host may not effectively recognize them, so reducing translation and hence the amount of protein synthesized. Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons).
  • 3.
    Illustration of CodonUsage bias Although codon usage is not the only barrier to success in heterologous gene expression, this tool provides a very easy way to check the compatibility of your gene and expression system before you start In this diagram is an illustration of Codon usage bias in P. patens as compared to other selected organisms. Codon usage differs between all genes and highly expressed genes. Codons significantly over represented in highly expressed genes are colored dark green/marked by a (+) and significantly under represented codons are colored light blue/marked by a (–) (p-adjusted <0.05, Fisher’s exact test).
  • 4.
    Open Reading Frames Anopen reading frame is a portion of a DNA molecule that, when translated into amino acids, contains no stop codons. The genetic code reads DNA sequences in groups of three base pairs, which means that a double-stranded DNA molecule can read in any of six possible reading frames-- three in the forward direction and three in the reverse. A long open reading frame is likely part of a gene. Target genes can be identified by searching online databases for long stretches of DNA that could potentially code for protein  These sequences – called open reading frames (ORF) – will be preceded by a start codon and uninterrupted by stop codons  Open reading frames will typically consist of at least 100 codons (300 nucleotides)  Searches can be refined by looking at regions downstream of known promoter sequences and upstream of termination sites While open reading frames may predict potential coding regions, they do not automatically guarantee the presence of a gene  Some long and uninterrupted sequences DNA may not actually be translated, whilst other short sequences may code protein
  • 5.
    Identifying Open ReadingFrames To identify an open reading frame:  Locate a sequence corresponding to a start codon in order to determine the reading frame – this will be ATG (sense strand)  Read this sequence in base triplets until a stop codon is reached (TGA, TAG or TAA)  The longer the sequence, the more significant the likelihood that the sequence corresponds to an open reading frame Certain bioinformatic programs can automatically identify potential ORFs when provided with a candidate sequence  Gene sequences are largely conserved – so if an ORF sequence is present in multiple genomes, it likely represents a gene Identification of an Open Reading Frame
  • 6.
    The Open ReadingFrame Finder ORF finder searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP. Examples (click to set values, then click Submit button) :  NC_011604 Salmonella enterica plasmid pWES-1; genetic code: 11; 'ATG' and alternative initiation codons; minimal ORF length: 300 nt  NM_000059; genetic code: 1; start codon: 'ATG only'; minimal ORF length: 150 nt Top of Form Enter Query Sequence Enter accession number, gi, or nucleotide sequence in FASTA format: From: To: Choose Search Parameters Use this link to attempt an illustration https://www.ncbi.nlm.nih.gov/orffinder/B ottom of Form