LECTURE 7.pptx

Lecture 7:Codon Usage analysis and plotting open reading frames
Dr. Naulikha Kituyi
Department of Biological Sciences
University of Embu
Biochemistry-
2021

Codon Usage
SELF-TEST QUESTIONS
•Explain why some genes may not be fully expressed in
certain host organisms during recombinant DNA
technologies.
• State the relevance of ORF finding in bioinformatics
Codon usage refers to the frequency with which a particular
organism uses the available CODONS in genes. The majority of
AMINOACIDS are coded for by more than one codon (see
GENETIC CODE and there are marked preferences for the use
of the alternative codons amongst different species. For
example, in bacteria CCG is the preferred codon for the amino
acid proline, rather than CCU, CCC or CCA. Codon usage can
affect the efficiency of translation, particularly when cloned
heterologous genes are being expressed. If the cloned gene
contains a large number of unfavoured codons the tRNA
molecules of the host may not effectively recognize them, so
reducing translation and hence the amount of protein
synthesized.
Codon usage bias refers to differences in the
frequency of occurrence
of synonymous codons in coding DNA. A
codon is a series of three nucleotides (a
triplet) that encodes a specific amino
acid residue in a polypeptide chain or for the
termination of translation (stop codons).

Illustration of Codon Usage bias
Although codon usage is not the only
barrier to success in heterologous gene
expression, this tool provides a very
easy way to check the compatibility of
your gene and expression system before
you start In this diagram is an illustration
of Codon usage bias in P. patens as
compared to other selected organisms.
Codon usage differs between all genes
and highly expressed genes. Codons
signiﬁcantly over represented in highly
expressed genes are colored dark
green/marked by a (+) and signiﬁcantly
under represented codons are colored
light blue/marked by a (–) (p-adjusted
<0.05, Fisher’s exact test).

Open Reading Frames
An open reading frame is a portion of a DNA molecule that,
when translated into amino acids, contains no stop codons.
The genetic code reads DNA sequences in groups of three
base pairs, which means that a double-stranded DNA
molecule can read in any of six possible reading frames--
three in the forward direction and three in the reverse. A long
open reading frame is likely part of a gene.
Target genes can be identified by searching online databases for long
stretches of DNA that could potentially code for protein
 These sequences – called open reading frames (ORF) – will be
preceded by a start codon and uninterrupted by stop codons
 Open reading frames will typically consist of at least 100 codons
(300 nucleotides)
 Searches can be refined by looking at regions downstream of
known promoter sequences and upstream of termination sites
While open reading frames may predict potential coding regions, they
do not automatically guarantee the presence of a gene
 Some long and uninterrupted sequences DNA may not actually be
translated, whilst other short sequences may code protein

Identifying Open Reading Frames
To identify an open reading frame:
 Locate a sequence corresponding to a start codon in order
to determine the reading frame – this will be ATG (sense
strand)
 Read this sequence in base triplets until a stop codon is
reached (TGA, TAG or TAA)
 The longer the sequence, the more significant the
likelihood that the sequence corresponds to an open reading
frame
Certain bioinformatic programs can automatically identify
potential ORFs when provided with a candidate sequence
 Gene sequences are largely conserved – so if an ORF
sequence is present in multiple genomes, it likely
represents a gene
Identification of an Open Reading Frame

The Open Reading Frame Finder
ORF finder searches for open reading frames (ORFs) in the DNA
sequence you enter. The program returns the range of each ORF,
along with its protein translation. Use ORF finder to search newly
sequenced DNA for potential protein encoding segments, verify
predicted protein using newly developed SMART BLAST or regular
BLASTP.
Examples (click to set values, then click Submit button) :
 NC_011604 Salmonella enterica plasmid pWES-1; genetic code:
11; 'ATG' and alternative initiation codons; minimal ORF length: 300
nt
 NM_000059; genetic code: 1; start codon: 'ATG only'; minimal ORF
length: 150 nt
Top of Form
Enter Query Sequence
Enter accession number, gi, or nucleotide sequence in FASTA format:
From: To:
Choose Search Parameters
Use this link to attempt an illustration
https://www.ncbi.nlm.nih.gov/orffinder/B
ottom of Form

LECTURE 7.pptx

More Related Content

Similar to LECTURE 7.pptx

Recently uploaded

LECTURE 7.pptx