Bikash Kar Nath
M.Sc. 1st Semester. MBBT
What is DNA sequencing?
What is Phylogeny?
DNA sequencing techniques.
Results of DNA sequencing.
Phylogenetic Interpretation of the results.
Construction of Phylogenetic Trees.
DNA sequencing is a scientific approach involving various
biochemical, biophysical and computational techniques to
determine the order of the nucleotide bases- adenine, guanine,
thymine & cytosine in a molecule of DNA.
DNA sequencing techniques are key tools in many fields. A large
number of different sciences are receiving the benefits of these
techniques, ranging from archaeology, anthropology, genetics,
biotechnology, molecular biology, forensic sciences, among others.
According to modern evolutionary theory, all organisms on earth have
descended from a common ancestor, which means that any set of species,
extant or extinct, is related. This relationship is called phylogeny, and is
represented by phylogenetic trees, which graphically represent the
evolutionary history related to the species of interest.
Phylogenetic tree of various foxes
Frederick Sanger’s Chain termination or Dideoxy-nucleotide Sequencing.
Allan Maxam & Walter Gilbert’s Chemical Degradation Sequencing.
Lynx Therapeutics' Massively Parallel Signature Sequencing.
Ion semiconductor sequencing.
DNA nanoball sequencing.
Single Molecule SMRT sequencing &Single Molecule real time RNAP
Nanopore DNA sequencing.
ddGTPddATP ddCTP ddTTP
6. One type of ddNTP per reaction
7.2. With region of known sequence
8. ddNTP incorporation -
stops chain synthesis
3’ 3’ 3’ 3’
3. Complementary primer
1. Unknown fragment
5. Four separate reactions
3’ 3’ 3’
5’ 5’ 5’
5’ 5’ 5’
4. dNTP’s 35S labeled dATP or
Reaction 4Reaction 3Reaction 2
(Read start be cloned for production of single-stranded DNA)
ddGTP ddATP ddTTP ddCTP
Sequence of unknown fragment
Shortest synthesized band = 5’ end of synthesized strand
Longest synthesized band = 3’
end of synthesized strand
The DNA to be sequenced must first be end labeled at one 5’ end generally by kinase
treatment with 32P ATP.
Labeled DNA is first precipitated to remove any salts which might interfere in the cleavage
Chemical treatment generates breaks at a small proportion of one or two of the four
nucleotide bases in each of four reactions (G, A+G, C, C+T). The purines (A+G) are
depurinated using formic acid, the guanines are methylated by DMS and the pyrimidines
(C+T) are methylated using hydrazine. The addition of salt (sodium chloride) to the
hydrazine reaction inhibits the methylation of thymine for the C-only reaction.
The modified DNAs are then cleaved by hot piperidine at the position of the modified base.
The concentration of the modifying chemicals is controlled to introduce on average one
modification per DNA molecule.
Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut"
site in each molecule. The fragments in the four reactions are electrophoresed side by side
in for size separation. To visualize the fragments, the gel is exposed to X-ray film for
autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA
fragment, from which the sequence may be inferred.
Massively Parallel Signature Sequencing :- MPSS was a bead-based method that used a complex
approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four
nucleotides. Subsequent development of less complex novel sequencing techniques made MPSS
Polony sequencing :- It combined an in vitro paired-tag library with emulsion PCR, an automated
microscope, and ligation-based sequencing chemistry to sequence an E. coli genome at an accuracy
of > 99.9999% and a cost approximately 1/10 that of Sanger sequencing.
Pyrosequencing :-The method amplifies DNA inside water droplets in an oil solution (emulsion PCR),
with each droplet containing a single DNA template attached to a single primer-coated bead that then
forms a clonal colony. Pyrosequencing uses luciferase to generate light for detection of the individual
nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-
Single Molecule real time RNAP sequencing:-This method is based on RNA polymerase (RNAP),
which is attached to a polystyrene bead, with distal end of sequenced DNA is attached to another
bead, with both beads being placed in optical traps. RNAP motion during transcription brings the
beads in closer and their relative distance changes, which can then be recorded at a single nucleotide
resolution. The sequence is deduced based on the four readouts with lowered concentrations of each
of the four nucleotide types.
The sample sequence is
finally obtained as a
graph with nucleotide
peaks or as a false
colour nucleotide base
Comparative analysis of
DNA can be used as an
important tool to analyze
Multiple Sequence Alignment
As obvious for a phylogenetic analysis DNA sequencing of more than a single species and
its subsequent sequence alignment is necessary.
Since the species supposedly share a common ancestry hence mismatches can be
interpreted as point mutations and gaps as indels (i.e. Insertion or deletion mutations)
introduced in one or both lineages in the time since they diverged from one another.
It is still a long and expensive process to sequence the entire DNA of an organism
(its genome) and this has been done for only a few species. However, it is quite feasible to
determine the sequence of a defined area of a particular chromosome. At any location
within such a sequence, the bases found in a given position may vary between organisms.
The particular sequence found in a given organism is referred to as its haplotype, and a
comparative analysis of these sequences can thus infer phylogenetic relationships.
Does a high degree of similarity mean that two DNA sequences have the
same meaning or function?
“There are many scientists today who question the evolutionary
paradigm and its atheistic philosophical implications”.
“There are not many scientists today who question the evolutionary
paradigm and its atheistic philosophical implications”.
These sentences have 97% similarity and yet have almost opposite meanings!
Thus Homology among DNA is often incorrectly concluded on the basis of sequence similarity.
The terms "percent homology" and "sequence similarity" are often used interchangeably. As with
anatomical structures, similarity might occur because of convergent evolution (giving rise to
homoplastic organs), similarly high sequence similarity may occur with shorter sequences,
because of chance. Such sequences are similar but not homologous. Hence establishment of
phylogenetic relationship takes into account conserved sequences occurring in all the sample
species & its extent of conservation.
To build phylogenetic trees, statistical methods are applied to determine the tree topology and calculate the
branch lengths that best describe the phylogenetic relationships of the aligned sequences in a dataset.
Many different methods for building trees exist and no single method performs well for all types of trees
Common approaches applied include the following:-
1. Distance-Matrix Methods:- They compute a matrix of pairwise “distances” between sequences that
approximate evolutionary distance.
2. Discrete data methods:- They examine each column of a multiple sequence alignment dataset separately
and search for the tree that best represents all this information. These methods produce a separate tree for
each column in the alignment, so it is possible to trace the evolution for specific elements within a given
sequence, such as catalytic sites or regulatory regions.
3. Maximum Likelihood:-The maximum likelihood method exhibits a probabilistic model of evolution for
estimating nucleotide substitution.
Thus a phylogenetic tree is
obtained highlighting the
relationship between various
species and the extent to
which they are related thereby
tracing their evolutionary
Phylogenetic Tree of the Canid family
Tracing the evolution of Man.
Tracing the evolution of biologically vital proteins.
Tracing the evolution of infectious pathogens.
Increasing the efficacy & efficiency of drugs by sample testing on
phylogenetically related species.
DNA sequencing and its use in phylogeny are increasingly being used in
virology laboratories to study the transmission of viruses.
Establishment of phylogenetic relationship between various species
on the basis of DNA sequencing provides a detailed and reliable
approach to tracing evolutionary history of those species as well as
predicting their further evolutionary pattern. It is highly vital in
understanding spontaneously evolving viruses and infectious
pathogens & in an attempt to combat against their afflictions as well as
increasing the value of economically important organisms.
A review of DNA sequencing techniques: Lilian T. C. Franc:a, Emanuel Carrilho and
Tarso B. L. Kist, Quarterly Reviews of Biophysics 35, 2 (2009), pp. 169–200. "
2009 Cambridge University Press.
Molecular Phylogenetics- An introduction to computational methods and tools for
analyzing evolutionary relationships: Karen Dowell
Information from various internet sources.