Dna sequencing

What is DNA?
PURINE
PYRAMIDIN
E

DEOXY RIBOSE
SUGAR
The deoxyribose sugar of the
DNA backbone has 5 carbons and
3 oxygens.
The carbon atoms are numbered
1', 2', 3', 4', and 5' to distinguish
from the numbering of the atoms of
the purine and pyrmidine rings.
The hydroxyl groups on the 5'-
and 3'- carbons link to the
phosphate groups to form the DNA
backbone.
Deoxyribose lacks an hydroxyl
group at the 2'-position when
compared to ribose, the sugar
component of RNA.

A nucleoside is one of the four DNA bases covalently attached to the C1'
position of a sugar.
Nucleosides differ from nucleotides in that they lack phosphate groups.
The four different nucleosides of DNA are deoxyadenosine (dA),
deoxyguanosine (dG), deoxycytosine (dC), (deoxy)thymidine (dT, or T).
NUCLEOSIDES AND
NUCLEOTIDES
In dA and dG, there is an "N-glycoside" bond between the sugar C1' and
N9 of the purine.
Nucleotides
A nucleotide is a nucleoside with one or more phosphate groups
covalently attached to the 3'- and/or 5'-hydroxyl group(s).

DNA
BACKBONEThe deoxyribose sugars are joined at both
the 3'-hydroxyl and 5'-hydroxyl groups to
phosphate groups in ester links, also
known as "phosphodiester" bonds.
Features of the 5'-d(CGAAT)
structure:
Alternating backbone of
deoxyribose and
phosphodiester groups
Chain has a direction (known
as polarity), 5'- to 3'- from top
to bottom
Oxygens (red atoms) of
phosphates are polar and
negatively charged
A, G, C, and T bases can
extend away from chain, and
stack atop each other
Bases are hydrophobic

DNA Double Helix
Two DNA strands form a
helical spiral, winding around a
helix axis in a right-handed
spiral
The two polynucleotide chains
run in opposite directions
The sugar-phosphate
backbones of the two DNA
strands wind around the helix
axis like the railing of a sprial
staircase
The bases of the individual
nucleotides are on the inside of
the helix, stacked on top of
each other like the steps of a
spiral staircase.dA-dT and dG-dC base pairs are the same length, and occupy the same
space within a DNA double helix. Therefore the DNA molecule has a
uniform diameter.
dA-dT and dG-dC base pairs can occur in any order within DNA

DNA Helix Axis
The helix axis is most apparent from a view directly down the axis. The
sugar-phosphate backbone is on the outside of the helix where the polar
phosphate groups (red and yellow atoms) can interact with the polar
environment. The nitrogen (blue atoms) containing bases are inside,
stacking perpendicular to the helix axis.

DNA SEQUENCING
Determining the number and order of nucleotides that make up a given molecule of
DNA.
How many base pairs (bp) are there in a human genome?
How much did it cost to sequence the first human genome?
How long did it take to sequence the first human genome?
When was the first human genome sequence complete?

1980 Nobel Prize for DNA Sequence Analysis
Maxam-Gilbert sequencing
(chemical cleavage method using double-stranded (ds) DNA).
Sanger-Coulson sequencing
(chain termination method using single-stranded (ss) DNA).

MAXAM-GILBERT SEQUENCING
It involves modification of the bases in DNA followed by chemical base-
specific cleavage.
Stages:
Double-stranded DNA to be sequenced is labelled by attaching a radioactive
phosphorus (32P) group to the 5' end. Polynucleotide kinase enzyme and 32P-dATP
is used here.
Using dimethyl sulphoxide and heating to 90oC, the two strands of the DNA are
separated and purified
Single-stranded sample is split into separate samples and each is treated with one
of the cleavage reagents. This part of the process involves alteration of bases
(e.g. dimethylsulphate methylates guanine) followed by removal of altered bases.
Lastly, piperidine is used for cleavage of the strand at the points where bases are
missing.
Base
specificity
Chemical used for
base alteration
Chemical used for
altered
baseremoval
Chemical used
for
strandcleavage
G Dimethylsulphate Piperidine Piperidine
A+G Acid Acid Piperidine
C+T Hydrazine Piperidine Piperidine
C Hydrazine + alkali Piperidine Piperidine
A>C Alkali Piperidine Piperidine

SANGER-COULSON SEQUENCING
This chain termination method uses single-stranded (ss) DNA) which is
usually cloned in M13 phage vector.
The method is based on the interruption by nucleotide analogues of
enzymatic synthesis of a second strand of DNA complementary to the sample.
A mixture of different length fragments is produced depending where the
interruptions occurred.
As with the Maxam-Gilbert method, the mixture of fragments is run on a gel
and the

STEPS
Cloning the DNA into phage or plasmid vector
A short oligonucleotide primer complimentary to the vector sequence as
starting point
DNA polymerase (e.g. Klenow fragment of DNA polymerase I or something
similar) is then added in the presence of normal dNTPS and dideoxy
nucleotides (ddNTPs)
ddNTPs are identical to normal nucleotides except that the hydroxyl groups
(OH) in the sugar ring are replaced with hydrogens (H).
The ratio of dNTPs:ddNTPs is apprx 100:1.
Complimentary strand synthesis occur away from the primer and away from
the 5’ end.
Once an analogue is incorporated chain termination occurs

Each of the 4 mixes is
run together on a
sequencing gel which
separates the
fragments by
electrophoresis
depending on their
size.
Urea, high voltage
Autoradiography
The DNA sequence is
read directly from the
gel

AUTOMATED DNA SEQUENCING
This can be carried out using capillary array electrophoresis.
This method was developed for the Human Genome Project and
greatly speeded up its completion.It is based on the Sanger-Coulson
chain termination method but the 4 different dideoxy nucleotides (ddA,
ddC, ddG and ddT) are fluorescently labelled (fluorophores) not
radioactively labelled.
BASEDYE
WAVELENGTH
AdRGG 570
GdROX 620
CdR110 540
TdTAMARA 600
Since 4 different fluorophores are used, all 4 reactions can be run in the
same tube, greatly increasing the speed and ease of sequencing.
After restriction, DNA fragments are separated by capillary electrophoresis
using small (approx. 100 microns in diameter), gel-filled capillary tubes,
clustered together and read with a laser scanning system.
Electropherogram: As each capillary tube is moved into the path of the laser
beam, fluorescently labelled nucleotides are detected one at a time,
producing a coloured electropherogram

Next Generation Sequencing
Next-generation sequencing refers to non-Sanger-based high-throughput DNA
sequencing technologies.
Millions or billions of DNA strands can be sequenced in parallel, yielding
substantially more throughput and minimizing the need for the fragment-
cloning methods that are often used in Sanger sequencing of genomes.
the Roche/454 FLX
the Illumina/Solexa Genome Analyzer
the Applied Biosystems SOLiD™ System
Life Technologies
• Pacific Biosciences
• Ion Torrent
• Oxford Nanopore

Next-generation DNA sequencing instruments
• All commercially-available sequencers have the following shared attributes:
• Random fragmentation of starting DNA, ligation with custom linkers = “a
library”
• Library amplification on a solid surface (either bead or glass)
• Direct step-by-step detection of each nucleotide base incorporated during the
sequencing reaction
• Hundreds of thousands to hundreds of millions of reactions imaged per
instrument run = “massively parallel sequencing”
• Shorter read lengths than capillary sequencers
• A “digital” read type that enables direct quantitative comparisons
• A sequencing mechanism that samples both ends of every fragment sequenced
(“paired end” reads)

Paired-end reads
• All next-gen platforms now offer paired end read capability, e.g. sequences
can be derived from both ends of the library fragments.
• Differences exist in the distance between read pairs, based on the
approach/platform.
• “paired ends” : linear fragment sequenced at both ends in two separate
reactions
• “mate pairs” : circularized fragment of >1kb, sequenced by a single
reaction read or by two separate end reads (platform dependent)
• In general, paired end reads offer advantages for sequencing large and
complex genomes because they can be more accurately placed (“mapped”)
than can single ended short reads.

Pyrosequencing
Genome Sequencing Utilizing Light-Emitting Luciferase and emPCR
The 454 pyrosequencing technology was developed by 454 Life Sciences
Based on the "sequencing by synthesis" principle which involves utilizing
single strand DNA, to be sequenced, and sequencing its complementary
strand with enzymatic action
The "sequencing by synthesis" principle also relies on the detection of
pyrophosphate (PPi) released on nucleotide incorporation, generating a
light signal, rather than chain termination with dideoxynucleotides.

Pyrosequencing requires 4 enzymes, dNTP's, 2 substrates (adenosine 5'
phosphosulfate (APS) & luciferin), a single strand of DNA and a light sensitive
camera.
The 4 required enzymes are:
DNA Polymerase: used to elongate complementary DNA strand.
ATP Sulfurylase: converts pyrophosphate (PPi) into ATP in the presence of
adeosine 5' phosphosulfate (APS).
Luciferase: Uses ATP to convert luciferin to oxyluciferin, which is a molecule
that emits visible light.
Apyrase: Degrades unincorporated nucleotides.
Requirements

Pyrosequencing Biochemistry
 In DNA synthesis, a dNTP is attached to the 3’
end of the growing DNA strand releasing
pyrophosphate (PPi).
 ATP sulfurylase uses PPi and adenosine 5’-
phosphosulfate to make ATP.
 ATP sulfurylase is normally used in sulfur
assimilation: it converts ATP and inorganic sulfate
to adenosine 5’-phosphosulfate and PPi.
However, the reaction is reversed in
pyrosequencing.
 Luciferase uses luciferin and ATP as substrates,
converting luciferin to oxyluciferin and
releasing visible light.
 The amount of light released is proportional
to the number of nucleotides added to the
new DNA strand.
 After the reaction has completed, apyrase is
added to destroy any leftover dNTPs.

 DNA is fragmented by nebulization
 The DNA strand’s ends are made blunt with appropriate
enzymes
 “A” and “B” adapters are ligated to the blunt ends using DNA
ligase
 The strands are denatured using sodium hydroxide to release
the ssDNA template library (sstDNA).
 The A and B adapters are used as priming sites for both
amplification and sequencing since their composition is
known.
 The B adapter contains a 5’ biotin tag used for
mobilization.
 The beads are magnetized and attract the biotin in the B
adaptors.
 The ratio of beads to DNA molecules is controlled so
that most beads get only a single DNA attached to them.
 EmPCR is performed and each bead ends up coated
with about a million identical copies of the original
DNA.

 After the emulsion PCR has been performed,
the oil is removed, and the beads are put into a
“picotiter” plate. Each well is just big enough
to hold a single bead.
 The pyrosequencing enzymes are attached to
much smaller beads, which are then added to
each well.
 The plate is then repeatedly washed with the
each of the four dNTPs, plus other necessary
reagents, in a repeating cycle.
 The plate is coupled to a fiber optic chip. A
CCD camera records the light flashes from
each well.

DISADVANTAGES
Data read in flow space rather than nucleotide space
Short read length
Poor performance in repeat regions

Sequencing by synthesis with reversible dye terminators

Third generation sequencers??
• Recently, new sequencing platforms were introduced.
• The Pacific Biosciences sequencer is a single molecule detection system that
marries nanotechnology with molecular biology.
• The Ion Torrent uses pH rather than light to detect nucleotide incorporations.
• The MiSeq is a scaled down version of the HiSeq, with faster chemistry and
scanning.
• All offer a faster run time, lower cost per run, reduced amount of data generated
relative to 2nd Gen platforms, and the potential to address genetic questions in the
clinical setting

Overview of the SMRT sequencing technology. (A) A complex of a DNA template and
active polymerase is immobilized at the bottom of the ZMW. (B) Phospholinked
nucleotides are introduced into the chamber of the ZMW. (C) Each of the four
nucleotides is labeled with a different colored fluorophore. (D) As a base is held in the
detection volume, a light pulse is produced. (E) Nucleotides held by the polymerase prior
to incorporation emit an extended signal that identifies the base being incorporated. (F)

Platform
Chemistr
y
Read
length (bp)
Run
Time
Gb/R
un
Advantage
Disadvantag
e
GS FLX
Titanium XL+
(Roche)
Pyrosequen
cing
700 23 hrs 0.7
Long read
length
High error rate
HiSeq
2500/2000
(Illumina)
Reversible
terminator
2 x 100
2-11
days
600
High
throughput
Short reads
and long run
time
5500xl W
(SOLiD, Life
Tech)
Ligation 2 x 50 8 days 320 Low error rate
Short reads
and long run
time
Pacific
Biosciences
RS
Real time
sequencing
3,000 up to
20,000
20 mins 3
No artifact,
longest read
length
High error rate
Complete
Genomics
Probe-
anchor
ligation
2 x 35 12 days 60 Low error rate
Short reads
and long run
time
The comparison of the five dominate next generation
sequencing platform

Some
terms
Bacterial artificial chromosome (BAC): bacterial DNA spliced with a
medium-sized fragment of a genome (100 to 300 kb) to be amplified in
bacteria and sequenced.
Contig: Contiguous sequence of DNA created by assembling overlapping
sequenced fragments of a chromosome (whether natural or artificial, as in
BACs
Draft sequence: Sequence with lower accuracy than a finished sequence;
some segments are missing or in the wrong order or orientation
Scaffold: A series of contigs that are in the right order but are not
necessarily connected in one continuous stretch of sequence
Genome: Entire genetic material of an organism
Genetic map:Determination of the relative positions of genes on a DNA
molecule (chromosome or plasmid) and of the distance, in linkage units or
physical units, between them.
Physical map: shows the physical locations of genes and other DNA
sequences of interest. Physical maps are used to help scientists identify
and isolate genes by positional cloning.

Genomic DNA is partially digested with RE
Cloned the fragments into BAC
Physical mapping of clones with RE or STS
Derive the Minimum tiling path
Select the BACs to be sequenced
Sub clone them
Sequence and assembly
Whole Genome Clone by Clone
Sequencing
One weakness of clone-based physical mapping is that the maps often have
poor continuity

Small insert libraries are prepared
Sequence random fragments until a ~5-
fold or higher coverage is reached.
Sequences are then assembled,
Gaps identified and closed
Finally annotation conducted.
No need of minimal tiling paths
Reduced cost and effort
Computational efforts more
Sequence quality low
Whole Genome Shotgun Sequencing

Human Genome Sequencing
>20,000 large bacterial artificial chromosome (BAC) clones that each
contained an approximately 100-kb fragment of the human genome
provided the tiling path
In BAC-based sequencing, each BAC clone is amplified in bacterial culture,
isolated in large quantities, and sheared to produce size-selected pieces of
approximately 2–3 kb.
These pieces are subcloned into plasmid vectors, amplified in bacterial culture,
and the DNA is selectively isolated prior to sequencing.
By generating approximately eightfold oversampling (coverage) of each
BAC clone in plasmid subclone equivalents, computer-aided assembly can
largely recreate the BAC insert sequence in contigs
Subsequent refinement, including gap closure and sequence quality
improvement (finishing), produces a single contiguous stretch of high-quality
sequence (typically with less than 1 error per 40,000 bases).

The transcriptome is the complete set of transcripts in a cell, both in terms of type
and quantity
A cDNA library is a combination of cloned cDNA fragments inserted into a
collection of host cells, which together constitute some portion of the transcriptome
of the organism
cDNA is produced from fully transcribed mRNAfound in the nucleus and
therefore contains only the expressed genes of an organism
cDNA library has a tissue or cell specificity

cDNA sequencing is considered as an efficient means to obtain functional genomic
data for non-model organisms or for those with genome characteristics prohibitive
to whole genome sequencing
•To reveal the gene profiles present in a variety of species
•To identify tissue-specific alternative splicing
•To specify novel genes and transcripts
•To elucidate genomic structural variations prior to whole genome sequencing
The identification of exons and introns
The mapping of their boundaries
The identification of the 5’ and 3’ends of genes
The identification of transcription start sites

mRNA isolation, purification
Check the RNA integrity
Fractionate and enrich mRNA
Synthesis of cDNA
Treatment of cDNA ends
Ligation to vector

• Most eukaryotic mRNAs are polyadenylated at their 3’ ends
• oligo (dT) can be bound to the poly(A) tail and used to recover
the mRNA.
AAAAAAAAAAn
5’ cap

1.Traditional method by passing a preparation of total
RNA down a column of oligo (dT)-cellulose
2.More rapid procedure is to add oligo(dT) linked to
magnetic beads directly to a cell lysate and ‘pulling out’ the
mRNA using a strong magnet
3.Alternative is lysing cells and then preparing mRNA-
ribosome complexes on sucrose gradients
Three methods to isolate mRNA

Make sure that the mRNA is not degraded.
Translating the mRNA : use cell-free translation system as wheat
germ extract or rabbit reticulocyte lysate to see if the mRNAs
can be translated
Analysis the mRNAs by gel elctrophoresis: use agarose or
polyacrylamide gels
Check the mRNA integrity
Fractionate on the gel: performed on the basis of size, mRNAs of
the interested sizes are recovered from agarose gels
Enrichment: carried out by hybridization

Synthesis of cDNA :
First stand synthesis: materials as reverse
transcriptase ,primer( oligo(dT) or hexanucleotides) and
dNTPs
Second strand synthesis: best way of
making full-length cDNA is to ‘tail’ the 3’-end of the first
strand and then use a complementary primer
to make the second.

Long RNAs are first
converted into a library of
cDNA fragments
Sequencing adaptors
(blue) are added to each
cDNA fragment
 A short sequence is
obtained from each cDNA
using high-throughput
sequencing technology.
The resulting sequence
reads are aligned with the
reference genome or
transcriptome
 Classified as three types:
exonic reads, junction reads
and poly(A) end-reads
 Generate a base-

ISSUES
Depth of coverage depends on the sequenceability of the genome

Advantages of RNA-seq compared to other transcriptomics methods

Challenges for RNA-Seq
Larger RNA molecules must be fragmented into smaller pieces (200–500
bp) to be compatible with most deep-sequencing technologies
Common fragmentation methods include RNA fragmentation (RNA
hydrolysis or nebulization) and cDNA fragmentation (DNase I treatment or
sonication)
RNA fragmentation has little bias over the transcript body, but is depleted
for transcript ends compared with other methods
Library Construction
cDNA fragmentation is usually strongly biased towards the identification of
sequences from the 3′ ends of transcripts
Manipulations during library construction also complicate the analysis
of RNA-Seq results
Many shorts reads that are identical to each other can be obtained from
cDNA libraries --could be a genuine reflection of abundant RNA species, or
they could be PCR artefacts

Bioinformatic challenges
Short transcriptomic reads also contain reads that span exon junctions
or that contain poly(A) ends — these cannot be analysed in the same
way
For genomes in which splicing is rare (for example, S. cerevisiae)
special attention only needs to be given to poly(A) tails and to a small
number of exon–exon junctions.
For complex transcriptomes it is more difficult to map reads that span
splice junctions, owing to the presence of extensive alternative splicing
and trans-splicing
For large transcriptomes, alignment is also complicated by the fact that
a significant portion of sequence reads match multiple locations in the
genome
Coverage versus cost
In general, the larger the genome, the more complex the transcriptome,
the more sequencing depth is required for adequate coverage

Dna sequencing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dna sequencing

Similar to Dna sequencing (20)

Recently uploaded

Recently uploaded (20)

Dna sequencing