Next Generation Sequencing

What is Next Generation DNA Sequencing
Next generation sequencing (NGS), also known as
highthroughput sequencing, is the catchall term used to
describe a number of different modern sequencing
technologies including:
• Illumina (Solexa) sequencing
• Roche 454 sequencing
• Ion torrent: Proton / PGM sequencing
• SOLiD sequencing
These recent technologies allow us to sequence DNA and RNA much
more quickly and cheaply than the previously used Sanger
sequencing
Dr. Shiny C Thomas, Department of Biosciences, ADBU

The four main advantages of NGS over classical Sanger
sequencing are:
• speed
• cost
• sample size
• accuracy
• NGS is significantly cheaper, quicker, needs significantly
less DNA and is more accurate and reliable than Sanger
sequencing. Let us look at this more closely.
• For Sanger sequencing, a large amount of template DNA
is needed for each read.
• In NGS, a sequence can be obtained from a single strand

• NGS is quicker than Sanger sequencing in two ways.
Firstly, the chemical reaction may be combined with the
signal detection in some versions of NGS, whereas in
Sanger sequencing these are two separate processes.
• Secondly and more significantly, only one read
(maximum ~1kb) can be taken at a time in Sanger
sequencing, whereas NGS is massively parallel,
allowing 300Gb of DNA to be read on a single run on a
single chip.

• The reduced time, manpower and reagents in NGS mean that the
costs are much lower. The first human genome sequence cost in
the region of £300M.
Using modern Sanger sequencing methods, aided by data from the
known sequence, a full human genome would still cost £6M.
Sequencing a human genome with Illumina today would cost only
£6,000.

Evolution of Technologies
Gene Transcription

Sequencing for Transcriptomics?
• Sample cells
• Purify mRNA*
• mRNA ⇒ cDNA
• Cleaving into 35-400nt fragments
• Sequence fragments in parallel
• PCR amplification
* or DNA for genomic sequencing

Next Generation Sequencing
Technology Overview
1. (c)DNA is fragmented
2. Adaptors ligated to fragments
3. Several possible protocols yield array of PCR colonies.
4. Enzymatic extension with fluorescently tagged
nucleotides.
5. Cyclic readout by imaging the array.

• Fragmentation of DNA
• Primers are attached to the surface of a bead
Technology 1. (emulsion PCR)

2. Bead preparation
• Fragments, with adaptors, are PCR amplified within a
water drop in oil.
• One primer is attached to the surface of a bead
• 3’ modification of fragments, to covalently bind bead to
chip surface.

Sequence of images
Sequence read over multiple cycles
Repeated cycles of sequencing to determine the
sequence of bases in a given fragment, a single
base at a time.

• While these “first-generation” instruments were
considered high throughput for their time, the Genome
Analyzer emerged in 2005 and took sequencing runs from
84 kilobase (kb) per run to 1 gigabase (Gb) per run.
• The short read, massively parallel sequencing technique
was a fundamentally different approach that
revolutionized sequencing capabilities and launched the
“next-generation” in genomic science.
• From that point forward, the data output of next-
generation sequencing (NGS) has—more than doubling
each year

Figure 1: Sequencing Cost and Data Output Since 2000—The dramatic
rise of data output and concurrent falling cost of sequencing since
2000. The Y-axes on both sides of the graph are logarithmic.

• In 2005, with the Genome Analyzer, a single sequencing
run could produce roughly one gigabase of data.
• By 2014, the rate climbed to a 1.8 terabases of data in a
single sequencing run—an astounding 1000× increase.
• It is remarkable to reflect on the fact that the first human
genome, famously co published in Science and Nature in
2001, required 15 years to sequence and cost nearly 3
billion dollars.
• In contrast, the HiSeq X™ Ten, released in 2014, can
sequence over 45 human genomes in a single day for
approximately $1000 each.

Human Genome Sequencing Over the Decades—The capacity to sequence all 3.2
billion bases of the human genome (at 30× coverage) has increased exponentially
since the 1990s.
In 2005, with the introduction of the Illumina Genome Analyzer System, 1.3 human
genomes could be sequenced annually. Nearly 10 years later, with the Illumina HiSeq
X Ten fleet of sequencing systems, the number has climbed to 18,000 human
genomes a year.

• Beyond the massive increase in data output, the
introduction of NGS technology has transformed the way
scientists think about genetic information.
• The $1000 dollar genome enables population-scale
sequencing and establishes the foundation for
personalized genomic medicine as part of standard
medical care.
• Researchers can now analyze thousands to tens of
thousands of samples in a single year.
• The rate of progress is stunning.

• As costs continue to come down, we are entering a period
where we are going to be able to get the complete catalog
of disease genes.
• This will allow us to look at thousands of people and see
the differences among them, to discover critical genes that
cause cancer, autism, heart disease, or schizophrenia.

The Basics of NGS Chemistry
• In principle, the concept behind NGS technology is
similar to CE (Capillary Electrophoresis) sequencing—
DNA polymerase catalyzes the incorporation of
fluorescently labeled deoxyribonucleotide triphosphates
(dNTPs) into a DNA template strand during sequential
cycles of DNA synthesis.
• During each cycle, at the point of incorporation, the
nucleotides are identified by fluorophore excitation.
• The critical difference is that, instead of sequencing a
single DNA fragment, NGS extends this process across
millions of fragments in a massively parallel fashion.

• Illumina sequencing by synthesis (SBS) chemistry is the
most widely adopted chemistry in the industry and
delivers the highest accuracy, the highest yield of error-
free reads, and the highest percentage of base calls above
Q30.
• The Illumina NGS workflows include 4 basic steps

1. Library Preparation—
• The sequencing library is prepared by random
fragmentation of the DNA or cDNA sample, followed by
5’ and 3’ adapter ligation.
• Alternatively, “tagmentation” combines the
fragmentation and ligation reactions into a single step
that greatly increases the efficiency of the library
preparation process.
• Adapter-ligated fragments are then PCR amplified and
gel purified.

2. Cluster Generation—
• For cluster generation, the library is loaded into a
flow cell where fragments are captured on a lawn
of surface-bound oligos complementary to the
library adapters.
• Each fragment is then amplified into distinct, clonal
clusters through bridge amplification.
• When cluster generation is complete, the
templates are ready for sequencing.

3. Sequencing—
• Illumina SBS technology utilizes a proprietary
reversible terminator–based method that detects
single bases as they are incorporated into DNA
template strands.
• As all 4 reversible terminator-bound dNTPs are
present during each sequencing cycle, natural
competition minimizes incorporation bias and
greatly reduces raw error rates compared to other
technologies.
• The result is highly accurate base-by-base
sequencing that virtually eliminates sequence-
context-specific errors, even within repetitive
sequence regions and homopolymers.

4. Data Analysis—
• During data analysis and alignment, the newly
identified sequence reads are then aligned to a
reference genome.
• Following alignment, many variations of analysis
are possible such as single nucleotide
polymorphism (SNP) or insertion-deletion (indel)
identification, read counting for RNA methods, phylogenetic
or metagenomic analysis, and more.

Next-Generation Sequencing Chemistry Overview

Advances in Sequencing Technology
Paired-End Sequencing
• A major advance in NGS technology occurred with the
development of paired-end (PE) sequencing.
• PE sequencing involves sequencing both ends of the DNA
fragments in a sequencing library and aligning the
forward and reverse reads as read pairs.
• In addition to producing twice the number of reads for
the same time and effort in library preparation,
sequences aligned as read pairs enable more accurate
read alignment and the ability to detect indels, which is
simply not possible with single-read data.

• Analysis of differential read-pair spacing also allows
removal of PCR duplicates, a common artifact resulting
from PCR amplification during library preparation.
• Furthermore, paired-end sequencing produces a higher
number of SNV calls following read-pair alignment.
• While some methods are best served by single-read
sequencing, such as small RNA sequencing, most
researchers currently use the paired-end approach.

Paired-End Sequencing and Alignment—Paired-end
sequencing enables both ends of the DNA fragment to be
sequenced.
• Because the distance between each paired read is known,
alignment algorithms can use this information to map the
reads over repetitive regions more precisely. This results in
much better alignment of the reads, especially across
difficult-to-sequence, repetitive regions of the genome.

What is the Illumina method of DNA sequencing?
Illumina sequencing has been used to sequence many
genomes and has enabled the comparison of DNA sequences
to improve understanding of health and disease.
Illumina sequencing generates many millions of highly
accurate reads making it much faster and cheaper than
other available

How does Illumina DNA sequencing work?
1. The first step in this sequencing technique is to break up
the DNA into more manageable fragments of around 200 to
600.
2. Short sequences of DNA called adaptors, are attached to
the DNA fragments.
3. The DNA fragments attached to adaptors are then made
single stranded. This is done by incubating the fragments
with sodium hydroxide.
4. Once prepared, the DNA fragments are washed across the
Flow cell. The complementary DNA binds to primers on the
surface of the flow cell and DNA that doesn’t attach is
washed away.

5. The DNA attached to the flow cell is then replicated to
form small clusters of DNA with the same sequence.
• When sequenced, each cluster of DNA molecules will
emit a signal that is strong enough to be detected by a
camera.
6. Unlabelled nucleotides and DNA polymerase are then
added to lengthen and join the strands of DNA attached to
the flow cell.
• This creates ‘bridges’ of double-stranded DNA between
the primers on the flow cell surface.
7. The double-stranded DNA is then broken down into single
stranded DNA using heat, leaving several million dense
clusters of identical DNA sequences.

8. Primers and fluorescently labelled terminators
(terminators are a version of nucleotide base – A, C, G or T -
that stop DNA synthesis) are added to the flowcell.
9. The primer attaches to the DNA being sequenced.
10. The DNA polymerase then binds to the primer and adds
the first fluorescently-labelled terminator to the new DNA
strand.
• Once a base has been added no more bases can be added
to the strand of DNA until the terminator base is cut from
the DNA.

11. Lasers are passed over the flowcell to activate the
fluorescent label on the nucleotide base.
• This fluorescence is detected by a camera and recorded
on a computer. Each of the terminator bases (A, C, G and
T) give off a different colour.
12. The fluorescently-labelled terminator group is then
removed from the first base and the next fluorescently-
labelled terminator base can be added alongside.
• And so the process continues until millions of clusters
have been sequenced.

13. The DNA sequence is analysed base-by-base during
Illumina sequencing, making it a highly accurate method.
• The sequence generated can then be aligned to a
reference sequence, this looks for matches or changes in
the sequenced DNA.

➢ In Illumina sequencing, 100150bp reads are used.
➢ Somewhat longer fragments are ligated to generic
adaptors and annealed to a slide using the adaptors.
➢ PCR is carried out to amplify each read, creating a spot
with many copies of the same read.
➢ They are then separated into single strands to be
sequenced.
➢ The slide is flooded with nucleotides and DNA polymerase.
➢ These nucleotides are fluorescently labelled, with the
colour corresponding to the base.
➢ They also have a terminator, so that only one base is
added at a time.

➢ An image is taken of the slide. In each read location, there
will be a fluorescent signal indicating the base that has
been added.

➢ The slide is then prepared for the next cycle. The
terminators are removed, allowing the next base to be
added, and the fluorescent signal is removed, preventing
the signal from contaminating the next image.
➢ The process is repeated, adding one nucleotide at a time
and imaging in between.
➢ Computers are then used to detect the base at each site
in each image and these are used to construct a
sequence.

➢ All of the sequence reads will be the same length, as the
read length depends on the number of cycles carried out.

Ion Torrent
• The sequencing chemistry itself is remarkably simple.
Naturally, a proton is released when a nucleotide is
incorporated by the polymerase in the DNA molecule,
resulting in a detectable local change of pH.
• Each micro-well of the Ion Torrent semiconductor
sequencing chip contains approximately one million
copies of a DNA molecule.
• The Ion Personal Genome Machine (PGM™) sequencer
sequentially floods the chip with one nucleotide after
another.

•
• If a nucleotide complements the sequence of the DNA
molecule in a particular micro-well, it will be
incorporated and hydrogen ions are released.
• The pH of the solution changes in that well and is
detected by the ion sensor, essentially going directly
from chemical information to digital information.
• If there are two identical bases on the DNA strand, the
voltage is double, and the chip records two identical
bases.
• If the next nucleotide that floods the chip is not a match,
no voltage change is recorded and no base is called.

• Because this is direct detection—no scanning, no
cameras, no light—each nucleotide incorporation is
measured in seconds enabling very short run times.
• Naturally, a proton is released when a nucleotide is
incorporated by the polymerase in the DNA molecule,
resulting in a detectable local change of pH.

Ion Torrent: Proton / PGM sequencing
• Unlike Illumina and 454, Ion torrent and Ion proton
sequencing do not make use of optical signals.
• Instead, they exploit the fact that addition of a dNTP
to a DNA polymer releases an H+ ion.
• As in other kinds of NGS, the input DNA or RNA is
fragmented, this time ~200bp. Adaptors are added and
one molecule is placed onto a bead.
• The molecules are amplified on the bead by emulsion
PCR.
• Each bead is placed into a single well of a slide.
Like 454, the slide is flooded with a single species of dNTP,
along with buffers and polymerase, one NTP at a time.

• The pH is detected is each of the wells, as each H+ ion
released will decrease the pH.
• The changes in pH allow us to determine if that base,
and how many thereof, was added to the sequence
read.
• The dNTPs are washed away, and the process is repeated
cycling through the different dNTP species.

The pH change, if any, is used to determine how many bases
(if any) were added with each cycle.

Pyrosequencing
• Pyrosequencing is a method of DNA sequencing
(determining the order of nucleotides in DNA) based on
the "sequencing by synthesis“ principle.
• It differs from Sanger sequencing, in that it relies on the
detection of pyrophosphate release on nucleotide
incorporation, rather than chain termination with
dideoxynucleotides.

• The desired DNA sequence is able to be determined by
light emitted upon incorporation of the next
complementary nucleotide
• Only one out of four of the possible A/T/C/G
nucleotides are added and available at a time
• So that only one letter can be incorporated on the
single stranded template (which is the sequence to be
determined).

• The intensity of the light determines if there are more
than one of these "letters" in a row.
• The previous nucleotide letter (one out of four possible
dNTP) is degraded before the next nucleotide letter is
added for synthesis: allowing for the possible revealing of
the next nucleotide(s) via the resulting intensity of light
(if the nucleotide added was the next complementary
letter in the sequence).
This process is repeated with each of the four letters until
the DNA sequence of the single stranded template is
determined.

• "Sequencing by synthesis" involves taking a single strand
of the DNA to be sequenced and then synthesizing its
complementary strand enzymatically.
• The pyrosequencing method is based on detecting the
activity of DNA polymerase (a DNA synthesizing enzyme)
with another chemoluminescent enzyme.
• Essentially, the method allows sequencing of a single
strand of DNA by synthesizing the complementary strand
along it, one base pair at a time, and detecting which base
was actually added at each step.

• The template DNA is immobile, and solutions of A, C, G,
and T nucleotides are sequentially added and removed
from the reaction.
Light is produced only when the nucleotide solution
complements the first unpaired base of the template.
• The sequence of solutions which produce
chemiluminescent signals allows the determination of the
sequence of the template.

• The single strand DNA (ssDNA) template is hybridized to
a sequencing primer and incubated with the enzymes
DNA polymerase, ATP sulfurylase, luciferase and apyrase,
and with the substrates adenosine 5´ phosphosulfate
(APS) and luciferin.
1. The addition of one of the four deoxynucleoside
triphosphates (dNTPs) (dATPαS, which is not a substrate for
a luciferase, is added instead of dATP to avoid noise)
initiates the second step.
• DNA polymerase incorporates the correct,
complementary dNTPs onto the template. This
incorporation releases pyrophosphate (PPi).

2. ATP sulfurylase converts PPi to ATP in the presence of
adenosine 5´ phosphosulfate.
• This ATP acts as a substrate for the luciferase mediated
conversion of luciferin to oxyluciferin that generates
visible light in amounts that are proportional to the
amount of ATP.
• The light produced in the luciferasecatalyzed reaction is
detected by a camera and analyzed in a pyrogram.
3. Unincorporated nucleotides and ATP are degraded by the
apyrase, and the reaction can restart with another
Nucleotide.

Limitation
• Currently, a limitation of the method is that the lengths
of individual reads of DNA sequence of 300-500
nucleotides, shorter than the 800-1000 obtainable with
chain termination methods (e.g. Sanger sequencing).

Pyrosequencing cycle
• Add dATP. If light is emitted, your sequence
starts with A. If not, the dATP is degraded (or
elutes past immobilized primer).
• Add dGTP. If light is emitted, the next base
must be a G.
• Then add T, then C. You now know at least
one (maybe more) base of the sequence.
• Repeat!

Pyrosequencing output
Runs of bases produce higher peaks – for instance, the sequence for (a)
is GGCCCTTG. Sample (c) comes from a heterozygous individual
(hence the heights in multiples of ½)

Roche 454
Roche 454 sequencing system is the first commercial
platforms for the next generation sequencing technology. Its
main principle of sequencing is illustrated as follows.

a. Preparation of DNA Library
• DNA Library construction in 454 sequencing system is
different from that of Illumina.
• It uses spray method to break DNA samples into small
fragments of 300-800bp, and adds different adapters at
both ends.
• Otherwise, use primers for amplification after DNA
denaturation, clone into specific vectors, and finally
constructing single stranded DNA library

b. Emulsion PCR
• These single stranded DNAs would be fixed by 28um beads
which are buried in emulsion.
• The biggest feature of emulsion PCR is the formation of a
large number of independent reaction space for DNA
amplification.
• The key technology is to separate different beats using the
characteristics of emulsion.

• The basic process is as follows.
• Before sample DNA amplification, aqueous solution with
all components of PCR reaction will be infused into the
surface of mineral oil with high-speed rotation, and it
forms numerous small water droplets wrapped by
mineral oil.
• One small droplet forms an independent PCR reaction
space. Ideally, each small drop of water contains only
one DNA template and one bead.

• On the surface of beads, which are wrapped by small
water droplets, there are complementary oligos to match
those adapters, so the single stranded DNA can
specifically bind to the beads.
• At the same time, incubation system contains PCR
reagents to ensure that each small DNA fragment fixed on
the bead can be the unique template for amplification.
• Moreover, PCR products can be also combined with
magnetic beads.
• After the reaction accomplishment, emulsion system can
be destroyed and target DNAs would be accumulated.

• Finally, each small fragment will be amplified about 1
million times, so as to achieve the amount level required
by the sequencing process.

c. Pyrosequencing
• A polymerase and single strand DNA binding protein are
needed to process beads with DNAs before sequencing.
• Then these beads are put on PTP plate.
• This plate has many special nanopores of 44um
diameters.
• Each nanopore can only accommodate one bead, which
can fix the position of each bead through this method,
in order to be convenient for sequencing.
• The method used in this sequencing process is
pyrosequencing.

• Put a smaller bead into the nanopore, and start the
sequencing reaction.
• DNA sequencing reaction is based on the single stranded
DNAs which have been amplified and fixed.
• If one dNTP can pair with the template DNA, the
pyrophosphate group will be released after synthesis.
• The released pyrophosphate group reacts with ATP
sulfuric acid chemical enzymes to produce ATP.
• CO-oxidation of ATP and luciferase makes the fluorescein
molecule triggered and fluoresce, and the CCD camera on
the other side of the PTP board records the signal of
fluorescent.

• Finally the results are processed by computer software.
• Because each kind of dNTP produces unique fluorescence
color in the reaction, DNA sequence can be measured
according to the fluorescence colors.
• After the reaction, ATP are degraded by diphosphatase,
leading to fluorescence quenching, so that sequencing
reaction goes into the next cycle.

454 sequencing
• Roche 454 sequencing can sequence much longer reads
than Illumina.
• Like Illumina, it does this by sequencing multiple reads at
once by reading optical signals as bases are added.
• As in Illumina, the DNA or RNA is fragmented into shorter
reads, in this case up to 1kb.
• Generic adaptors are added to the ends and these are
annealed to beads, one DNA fragment per bead.
• The fragments are then amplified by PCR using adaptor
specifc primers.
• Each bead is then placed in a single well of a slide. So
each well will contain a single bead, covered in many PCR
copies of a single sequence.

• The wells also contain DNA polymerase and sequencing
buffers.
• The slide is flooded with one of the four NTP species.
Where this nucleotide is next in the sequence, it is added
to the sequence read.
• If that single base repeats, then more will be added.
• So if we flood with Guanine bases, and the next in a
sequence is G, one G will be added, however if the next
part of the sequence is GGGG, then four Gs will be added.

• The addition of each nucleotide
releases a light signal.
• These locations of signals are
detected and used to determine
which beads the nucleotides are
added to.

• This NTP mix is washed away. The next
NTP mix is now added and the process
repeated, cycling through the four NTPs.

• This kind of sequencing generates graphs for each
sequence read, showing the signal density for each
nucleotide wash.
• The sequence can then be determined computationally
from the signal density in each wash.

• All of the sequence reads we get from 454 will be different
lengths, because different numbers of bases will be added
with each cycle.

SOLiD
• An open source sequencer that utilizes emulsion PCR to
immobilize the DNA library onto a solid support and
cyclic sequencing-by-ligation chemistry.
Sequencing Library Preparation and Immobilization
• The in vitro sequencing library preparation for SOLiD
involves fragmentation of the DNA sample to an
appropriate size range (400–850 bp), end repair and
ligation of “P1” and “P2” DNA adapters to the ends of the
library fragments

• Emulsion PCR is applied to immobilize the sequencing
library DNA onto “P1” coated paramagnetic beads.
• High-density, semi-ordered polony arrays are generated by
functionalizing the 3 ¢ ends of the templates and
immobilizing the modified beads to a glass slide.
• The glass slides can be segmented up to eight chambers to
facilitate up scaling of the number of analyzed samples.

• Sequencing by Ligation The SOLiD sequencing
chemistry is based on ligation (Fig)
• A sequencing primer is hybridized to
the “P1” adapter in the immobilized
beads.
• A pool of uniquely labeled
oligonucleotides contains all possible
variations of the complementary
bases for the template sequence

• SOLiD technology applies partially degenerate,
fluorescently labeled, DNA octamers with dinucleotide
complement sequence recognition core.
• These detection oligonucleotides are hybridized to the
template and perfectly annealing sequences are ligated
to the primer.
• After imaging, unextended strands are capped and
fluorophores are cleaved.
• A new cycle begins 5 bases upstream from the priming
site.
• After the seven sequencing cycles first sequencing
primer is peeled off and second primer, starting at n-1
site, is hybridized to the template.

• In all, 5 sequencing primers (n, n-1, n-2, n-3, and n-4)
are utilized for the sequencing.
• As a result, the 35-base insert is sequenced twice to
improve the sequencing accuracy.

Next Generation Sequencing

More Related Content

What's hot

Similar to Next Generation Sequencing

More from shinycthomas

Recently uploaded

Next Generation Sequencing