2. • DNA Sequencing – 1st , NGS, 3rd generation
• RNA Sequencing
• SNP
• Epigenetic markers
• Omics
Contents in Detail
3. Sequencing includes several methods and
technologies that are used for determining the
order of the nucleotide bases—adenine,
guanine, cytosine, and thymine—in a
molecule of DNA/RNA
4. History of DNA sequencing
1953
Discovery of the structure of
the DNA double helix
1972
Development of Recombinant
DNA technology,.
1977
The first complete DNA genome
to be sequenced is that of
Bacteriophage φX174 &
Frederick Sanger publishes
"DNA sequencing with chain-
terminating inhibitors“
1984
Medical Research Council
scientists decipher the complete
DNA sequence of the Epstein-
Barr virus, 170 kb.
1987
Applied Biosystems markets first
automated sequencing machine,
the model ABI 370.
1990
The U.S. National Institutes of
Health (NIH) begins large-scale
sequencing trials on M.
capricolum, E. coli
Caenorhabditis elegans and S.
cerevisiae
1995
Craig Venter Hamilton Smith
and colleagues publish the 1st
complete genome of bacterium
H. influenzae (whole-genome
shotgun sequencing.)
1996
Pål Nyrén and his student
Mostafa Ronaghi at the Royal
Institute of Technology in
Stockholm publish their method
of Pyrosequencing
1998
Phil Green and Brent Ewing
of the University of
Washington publish "phred”
for sequencer data analysis.
2001
A draft sequence of the
human genome is published.
2004
454 Life Sciences markets a
parallelized version of
Pyrosequencing.
2006
Era of Next Generation
Sequencing- 454
Sequencing, Illumina etc.
6. Maxam-Gilbert sequencing is performed by chain
breakage at specific nucleotides.
DMS
G
G
G
FA
G
A
G
G
A
G
A
H
C
T
T
C
T
C
C
T
H+S
C
C
C
C
G
A
Maxam-Gilbert Sequencing
7. the method requires radioactive labeling at one 5' end of the
DNA (typically by a kinase reaction using gamma-32P ATP)
and purification of the DNA fragment to be sequenced.
Chemical treatment generates breaks at a small proportion of
one or two of the four nucleotide bases in each of four
reactions (G, A+G, C, C+T).
For example, the purines (A+G) are depurinated using
formic acid, the guanines (and to some extent the adenines) are
methylated by dimethyl sulfate, and the pyrimidines (C+T) are
methylated using hydrazine.
The addition of salt (sodium chloride) to the hydrazine
reaction inhibits the methylation of thymine for the C-only
reaction.
8. The modified DNAs are then cleaved by hot piperidine at the
position of the modified base. The concentration of the
modifying chemicals is controlled to introduce on average one
modification per DNA molecule.
Thus a series of labeled fragments is generated, from the
radiolabeled end to the first "cut" site in each molecule.
The fragments in the four reactions are electrophoresed side
by side in denaturing acrylamide gels for size separation.
To visualize the fragments, the gel is exposed to X-ray film
for autoradiography, yielding a series of dark bands each
corresponding to a radiolabeled DNA fragment, from which the
sequence may be inferred.
9. Sequencing gels are read from bottom to top (5′ to 3′).
G G+A T+C C
3′
A
A
G
C
A
A
C
G
T
G
C
A
G
5′
Longer fragments
Shortest fragments
G
A
Maxam-Gilbert Sequencing
10. The elegant idea behind DNA
sequencing
Fred Sanger
In the 1970’s, Sanger’s group discovered a
fundamentally new method of 'reading' the linear DNA
sequence using special bases called chain terminators
or dideoxynucleotides.
This method is still in use today.
Shared with Walter Gilbert and Paul Berg
12. Sanger Method
1. A single strand of DNA to be sequenced (yellow)
is hybridized to a 5’ end labeled synthetic
deoxynucleotide primer(Brown).
2. The primer is elongated using DNA polymerase in four
separate reaction mixtures containing four normal
deoxynucleotide triphosphates (dNTPs) plus one of
dideoxynucleotide triphosphate (ddNTPs) in a ratio of
100 :1.
13. 3. In each tube, the primer enlongation is terminated by the
incorporation of a dideoxynucleotide triphosphate into the
newly synthesized chain.
14. 4. The synthesized DNA chains can then be separated by
gel electrophoresis. Using this gel analysis of fragments,
the sequence of the template DNA chain be determined.
Progression of Sequencing Reaction
Sequencing Reaction Products
16. The advancement of next-generation sequencing (NGS) technology
Kang, et al (2015) Plant Biotechnol. J., doi: 10.1111/pbi.12449
17. NGS Platforms
• Differ in design and chemistries
• Fundamentally related-
sequencing of thousands to
millions of clonally amplified
molecules in a massively parallel
manner
• Attractive for clinical applications –
individual sequencing assays
costly and laborious- serial “gene
by gene” analysis
18. Cluster Amplification : Emulsion PCR
Fragments with adaptors (the library) are PCR amplified within a water drop in oil.
One PCR primer is attached to the surface of a bead.
DNA molecules are synthesized on the beads in the water droplet. Each bead bears
clonal DNA originated from a single DNA fragment
Beads (with attached DNA) are then deposited into the wells of sequencing chips –
one well, one bead
19. MPSS- Massively Parallel signature
Sequencing
• The MPSS is suited for quantification of gene
expression
• It uses multiple cycles of enzymatic cleavage
and ligation to determine 17–20-bp long
“signature” sequences from the ends of cDNA
molecules to distinguish and quantify the different
RNA species present in the sample
20. ROCHE/454 Sequencing
Sequence much longer reads by sequencing multiple reads at once by reading
optical signals as bases are added.
The DNA or RNA is fragmented into shorter reads up to 1kb.
Uses Emulsion PCR for Clustal Amplificication.
PYROSEQUENCING as sequencing approach.
21. Pyrosequencing: non-electrophoretic, bioluminescence method that measures the
release of inorganic pyrophosphate by proportionally converting it into visible light
using a series of enzymatic reaction
Nucleotide incorporation generates light seen as a peak in the Pyrogram
trace
Pyrosequencing
Adenosine 5′-phosphosulfate
(APS)
hydrolyze the
unincorporated dNTPs
22. ION TORRENT Sequencing
Ion torrent and ion proton sequencing do not make useof optical signal
exploit the fact thataddition of adNTP toa DNA polymer releasesan H+
Run time: 3 h; no termination ordeprotection steps
Clustal Amplification- Emulsion PCR
Read length: 100–300 bp
Throughput determined by chip size : 10Mb – 5 Gb
Cost: $1–$20/Mb
The pH change, if any, is used todetermine how many
bases (if any) wereadded with each cycle.
23. ABI- SOLiD Sequencing
AB SOLIDTM 3 System generates over 20 gigabases & 400 M
tags per run .
Library Preparation
Emulsion PCR/ Bead
Sequencing by Ligation
2.
The Applied Biosystems, USA, commercialized the Polony method in
2005 as SOLiD 3.0 platform.
SOLiD stands for “sequencing by oligonucleotide ligation detection”
since this method achieves DNA sequencing by detecting oligonucleotide
ligation.
24. Sequencing by Ligation
Sequencing by Ligation (SBL) uses the enzyme DNA ligase to identify the
nucleotide present at a given position in a DNA sequence.
Each oligonucleotide is 8 bases long and is labeled with fluorophore at the 5 end, and each
member of a set of 16 oligos has a unique combination of two nucleotides at its 3 end.
Fluorescence from each cycle is recorded and analyzed to obtain the base sequence of
the template strand.
25. Cluster Amplification : Bridge PCR
DNA fragments are flanked with adaptors (library)
A solid surface is coated with primers complementary to the two adaptor sequences
Isothermal amplification, with one end of each “bridge” tethered to the surface
Clusters of DNA molecules are generated on the chip. Each cluster is originated from a
single DNA fragment, and is thus a clonal population.
26. Solid-phaseamplification can produce 100-200 million spatiallyseparated clusters, providing free
ends towhich a universal sequencing primer can be hybridized to initiate the NGS reaction
ILLUMINA/SOLEXA Sequencing
In each read location,
there will be a fluorescent
signal indicating the base
that has been added
The terminators are removed,
allowing the next base to be
added, and the fluorescent
signal is removed, preventing
the signal from contaminating
the next image
The process is repeated, adding one
nucleotide at a time and imaging in
between computers are then used to
detect the base at each site in each
image and these are used to construct
a sequence.
27. TGS– Third Generation Sequencing
• The TGS methods do not use PCR amplification for
template preparation because they sequence single DNA
molecules
Helicos Genetic Analysis System
Single-Molecule Real-Time Technology
The Nanopore Sequencing Technologies
28. SMRT (single-molecule real- time)
- Flowcell: Zeromode waveguide (ZMW) ‘wells’, each holding 20 zeptoliters (10–21 liters).
- The ZMW holes are ~70 nm in diameter and ~100 nm in depth. Light travel is impaired through a small aperture →
the optical field decays exponentially inside the chamber.
- Within this tiny volume, the activity of DNA polymerase incorporating a single nucleotide can be readily detected,
at a pace of ~3 bases/second
- DNA polymerase immobilized in the bottom: detection of only the bottom of the well where the nucleotide
incorporation happens
- dNTP incorporation on each single-molecule visualized with a laser and camera: recording of the color and
duration of emitted light (labeled nucleotide pauses during incorporation at the bottom of the ZMW)
29. Helicos Genetic Analysis
Primers (50 nucleotide long poly(dT) oligos) immobilized on a microfluidics
cell
Primer extension by addition of dNTP
The growing chain produces fluorescence due to Cy3/Cy5
Fluorescence recorded
31. Nanopore sequencing
identifies individual bases
as a strand of DNA is
passed through a pore.
Nanopore technologies
This new technology has been dubbed ‘Next, next-generation sequencing”
Demonstrating the difference between the
pop culture Minion on the left and the
genome sequencing MinION on the right.
So how does it work?
At the core of the MinION are two biological components: the nanopore protein and a motor protein.
The nanopore protein sits on top of an artificial layer and acts a microscopic sluice gate that controls
how much of the sample solution passes through it into the lower layer. The sample solution contains
DNA, but also ions that pass through the nanopore, thus creating a measurable electrical current. If a
big molecule like a strand of DNA passes through the nanopore, the flow of ions is perturbed, which
results in a change in the electrical current. These changes are recorded and interpreted to give the
sequence of said DNA molecule. Meanwhile, the motor protein sticks to a DNA molecule, attaches
itself to the top of the nanopore, and feeds the DNA through the nanopore as a single strand at a
certain speed. Each MinION device has thousands of nanopores allowing for as many molecules to pass
through and be sequenced in real time. The sequence data are sent to a cloud server in real time,
where they are transformed and analyzed and the final data sent back to the user. This eliminates the
need for an expensive computer infrastructure as well as the need for extensive training in
bioinformatics.
32. “The MinION has been used to successfully read
the genome of a lambda bacteriophage, which has
48,500-ish base pairs, twice during one pass. That's
impressive, because reading 100,000 base pairs
during a single DNA capture has never been
managed before using traditional sequencing
techniques.
The operational life of the MinION is only about
six hours, but during that time it can read more
than 150 million base pairs. That's somewhat
short of the larger human chromosomes (which
contain up to 250 million base pairs), but Oxford
Nanopore has also introduced GridION -- a
platform where multiple cartridges can be
clustered together. The company reckon that a 20-
node GridION setup can sequence a complete
human genome in just 15 minutes.” —Wired
33. Van et al (2012) SABRAO J. Breed. Genet. 45 (1) 84-99
Overview of whole-genome sequencing by next-generation
sequencing method
34. Comparison of high-throughput sequencing methods
Method Read length
Accuracy (single
read not
consensus)
Reads per run
Time per
run
Cost per 1
million bases
(in US$)
Advantages Disadvantages
Single-molecule
real-time
sequencing (Pacific
Biosciences)
10,000 bp to 15,000
bp avg (14,000 bp;
maximum read
length >40,000
bases
87% single-read
accuracy
50,000 per SMRT
cell, or 500–1000
megabases
30 minutes to
4 hours
$0.13–$0.60
Longest read length.
Fast. Detects 4mC,
5mC, 6mA.
Moderate throughput.
Equipment can be very
expensive.
Ion semiconductor
(Ion Torrent
sequencing)
up to 400 bp 98% up to 80 million 2 hours $1
Less expensive
equipment. Fast.
Homopolymer errors.
Pyrosequencing
(454)
700 bp 99.9% 1 million 24 hours $10 Long read size. Fast.
Runs are expensive.
Homopolymer errors.
Sequencing by
synthesis
(Illumina)
MiniSeq, NextSeq:
75-300 bp; MiSeq:
50-600 bp; HiSeq
2500: 50-500 bp;
HiSeq 3/4000: 50-
300 bp; HiSeq X:
300 bp
99.9% (Phred30)
MiniSeq/MiSeq: 1-
25 Million;
NextSeq: 130-00
Million, HiSeq
2500: 300 million -
2 billion, HiSeq
3/4000 2.5 billion,
HiSeq X: 3 billion
1 to 11 days,
depending
upon
sequencer
and specified
read length
$0.05 to $0.15
Potential for high
sequence yield,
depending upon
sequencer model and
desired application.
Equipment can be very
expensive. Requires high
concentrations of DNA.
Sequencing by
ligation (SOLiD
sequencing)
50+35 or 50+50 bp 99.9% 1.2 to 1.4 billion 1 to 2 weeks $0.13 Low cost per base.
Slower than other methods.
Has issues sequencing
palindromic sequences.
Nanopore
Sequencing
(MinION - Oxford
Nanopore
5.4 kb average (Up
to 300 kb reported)
~90% single read
(up to 99%
consensus)
4.4 Million
1 min to 48
hrs
$0.11 - 0.5
Very long reads,
Affordable equipment
(MinION starter kit is
only $1000 USD),
Portable (Palm sized)
Lower throughput than other
machines, Only 90% single
read accuracy
Chain termination
(Sanger
sequencing)
400 to 900 bp 99.9% N/A
20 minutes to
3 hours
$2400
Long individual reads.
Useful for many
applications.
More expensive and
impractical for larger
sequencing projects. This
method also requires the time
consuming step of plasmid
cloning or PCR.
35. Next Generation Sequencing (NGS) Applications
Production of short sequence reads in very high throughput
and cost effective manner
De novo sequencing
Targeted resequencing of genomes
Finemapping of QTLs
QTL identification
Candidate gene identification
Molecular diagnostics for Oncology & Inherited Disease
study
Protein-coding gene annotation
Discovery of novel exons and introns