Nucleosides
 Base linked to a 2-deoxy-D-ribose at 1’ carbon
Nucleotides
• Nucleosides with a phosphate at 5’ carbon
Phosphodiester Bond
DNA Polymerase
Determining the Sequence of DNA
Methods:
1. Chain termination or dideoxy method
 F. Sanger
1. Shotgun sequence method
2. 2nd
generation sequence methods
 Pyrosequencing
Dideoxy (Sanger) Method4 Steps:
1. Denaturation
2. Primer attachment and extension of bases
3. Termination
4. Gel electrophoresis
1
4
3
2
Gel
electrophoresis
5
• ddNTP- 2’,3’-
dideoxynucleotide
• No 3’ hydroxyl
• Terminates chain
when incorporated
• Add enough so each
ddNTP is randomly
and completely
incorporated at each
base
• Run four separate
reactions each with
different ddNTPs
• Run on a gel in
four separate lanes
• Read the gel from
the bottom up
Determining DNA Sequence
Originally 2 methods were invented around 1976, but only
one is widely used: the chain-termination method
invented by Fred Sanger.
 The other method is Maxam-Gilbert chemical degradation method,
which is still used for specialized purposes, such as analyzing DNA-
protein interactions.
More recently, several cheaper and faster alternatives have
been invented. It is hard to know which of these methods,
or possibly another method, will ultimately become
standard. We will discus two of them: 454
pyrosequencing and Illumina/Solexa sequencing
Sanger Sequencing
 Uses DNA polymerase to synthesize a second DNA
strand that is labeled. DNA polymerase always adds
new bases to the 3’ end of a primer that is base-
paired to the template DNA.
 DNA polymerase is modified to eliminate its
editing function
 Also uses chain terminator nucleotides: dideoxy
nucleotides (ddNTPs), which lack the -OH group
on the 3' carbon of the deoxyribose. When DNA
polymerase inserts one of these ddNTPs into the
growing DNA chain, the chain terminates, as
nothing can be added to its 3' end.
Sequencing Reaction
 The template DNA is usually single stranded
DNA, which can be produced from plasmid
cloning vectors that contain the origin of
replication from a single stranded bacteriophage
such as M13 or fd. The primer is complementary
to the region in the vector adjacent to the
multiple cloning site.
 Sequencing is done by having 4 separate
reactions, one for each DNA base.
 All 4 reactions contain the 4 normal dNTPs, but
each reaction also contains one of the ddNTPs.
 In each reaction, DNA polymerase starts creating
the second strand beginning at the primer.
 When DNA polymerase reaches a base for which
some ddNTP is present, the chain will either:
 terminate if a ddNTP is added, or:
 continue if the corresponding dNTP is
added.
 which one happens is random, based on
ratio of dNTP to ddNTP in the tube.
 However, all the second strands in, say, the A
tube will end at some A base: you get a collection
of DNAs that end at each of the A's in the region
being sequenced.
Electrophoresis
 The newly synthesized DNA from
the 4 reactions is then run (in
separate lanes) on an
electrophoresis gel.
 The DNA bands fall into a ladder-
like sequence, spaced one base
apart. The actual sequence can be
read from the bottom of the gel up.
 Automated sequencers use 4
different fluorescent dyes as tags
attached to the dideoxy nucleotides
and run all 4 reactions in the same
lane of the gel.
 Today’s sequencers use capillary
electrophoresis instead of slab gels.
 Radioactive nucleotides (32
P) are
used for non-automated
sequencing.
 Sequencing reactions usually
produce about 500-1000 bp of good
sequence.
Next Generation Sequencing
 Recently a number of faster and cheaper sequencing methods have been
developed.
 The Archon X prize (2006): "the first Team that can build a device and use it to
sequence 100 human genomes within 10 days or less, with an accuracy of no more
than one error in every 100,000 bases sequenced, with sequences accurately
covering at least 98% of the genome, and at a recurring cost of no more than
$10,000 (US) per genome.”
 Currently there is a push for (and NIH grant money for) developing a method that
will sequence the entire human genome for $1000, to allow personal genomics.
 One of the most widely used new methods involve the pyrosequencing biochemical
reactions (invented by Nyren and Ronaghi in 1996), with the massively parallel
microfluidics technology invented by the 454 Life Sciences company. We can call
this combined technology “454 sequencing”.
 Applications:
 sequencing of whole bacterial genomes in a single run
 sequencing genomes of individuals
 metagenomics: sequencing DNA extracted from environmental samples
 looking for rare variants in a single amplified region, in tumors or viral infections
 transcriptome sequencing: total cellular mRNA converted to cDNA.
Pyrosequencing Biochemistry
 In DNA synthesis, a dNTP is attached to the 3’
end of the growing DNA strand. The two
phosphates on the end are released as
pyrophosphate (PPi).
 ATP sulfurylase uses PPi and adenosine 5’-
phosphosulfate to make ATP.
 ATP sulfurylase is normally used in sulfur assimilation:
it converts ATP and inorganic sulfate to adenosine 5’-
phosphosulfate and PPi. However, the reaction is
reversed in pyrosequencing.
 Luciferase is the enzyme that causes fireflies to
glow. It uses luciferin and ATP as substrates,
converting luciferin to oxyluciferin and
releasing visible light.
 The amount of light released is proportional to
the number of nucleotides added to the new
DNA strand.
 After the reaction has completed, apyrase is
added to destroy any leftover dNTPs.
Sequence Assembly
 DNA is sequenced in very small
fragments: at most, 1000 bp.
Compare this to the size of the
human genome: 3,000,000,000 bp.
How to get the complete sequence?
 In the early days (1980’s), genome
sequencing was done by
chromosome walking (aka primer
walking): sequence a region, then
make primers from the ends to
extend the sequence. Repeat until
the target gene was reached.
 The cystic fibrosis gene was
identified by walking about 500 kbp
from a closely linked genetic marker,
a process that took a long time and
was very expensive.
 Still useful for fairly short DNA
molecules, say 1-10 kbp.
Shotgun Sequencing
 Shotgun sequencing is what is
typically done today: DNA is
fragmented randomly and
enough fragments are sequenced
so each base is read 10 times or
more on average. The
overlapping fragments (“reads”)
are then assembled into a
complete sequence.
 For large genomes, hierarchical
shotgun sequencing is a useful
technique: first break up the
genome into an ordered set of
cloned fragments (scaffolds),
usually BAC clones. Each BAC is
shotgun sequenced separately.
Shotgun
Sequencing
Used to sequence
whole genomes
Steps:
DNA is broken up
randomly into
smaller fragments
Dideoxy method
produces reads
Look for overlap of
reads
Strand Sequence
First Shotgun Sequence
AGCATGCTGCAGTCATGCT-------
-------------------TAGGCTA
Second Shotgun Sequence
AGCATG--------------------
------CTGCAGTCATGCTTAGGCTA
Reconstruction AGCATGCTGCAGTCATGCTTAGGCTA
Human Genome Project
Began in 1990
Why?
Human evolution
Nature versus nurture
Causes of disease
2nd
Generation: Pyrosequencing
Sequencing by synthesis
Advantages:
Accurate
Parallel processing
Easily automated
Eliminates the need for labeled primers and nucleotides
No need for gel electrophoresis
PyrosequencingBasic idea:
Visible light is generated and is proportional to the
number of incorporated nucleotides
1pmol DNA = 6*1011
ATP = 6*109
photons at 560nm
DNA Polymerase I from E.coli.
pyrophospate
From fireflies, oxidizes luciferin and generates light
Pyrosequencing
 2nd
Method
Liquid Phase
○ 3 enzymes + apyrase (nucleotide degradation enzyme)
 Eliminates need for washing step
• In the well of a microtiter
plate:
• primed DNA
template
• 4 enzymes
• Nucleotides are added
stepwise
• Nucleotide-degrading
enzyme degrade previous
nucleotides
Pyrosequencing Method:
Pyrosequencing Results:
Summary
 DNA sequencing is a common procedure
 Dideoxy method
Chain termination method
Best for small DNA segments
 Whole genome shotgun sequencing
Sequence human genome
Fragments larger DNA strand to manageable chunks
 Pyrosequencing
Sequence by synthesis
Accurate and fast

dna sequencing methods

  • 3.
    Nucleosides  Base linkedto a 2-deoxy-D-ribose at 1’ carbon Nucleotides • Nucleosides with a phosphate at 5’ carbon
  • 4.
  • 5.
    Determining the Sequenceof DNA Methods: 1. Chain termination or dideoxy method  F. Sanger 1. Shotgun sequence method 2. 2nd generation sequence methods  Pyrosequencing
  • 6.
    Dideoxy (Sanger) Method4Steps: 1. Denaturation 2. Primer attachment and extension of bases 3. Termination 4. Gel electrophoresis
  • 7.
  • 8.
    • ddNTP- 2’,3’- dideoxynucleotide •No 3’ hydroxyl • Terminates chain when incorporated • Add enough so each ddNTP is randomly and completely incorporated at each base
  • 9.
    • Run fourseparate reactions each with different ddNTPs • Run on a gel in four separate lanes • Read the gel from the bottom up
  • 11.
    Determining DNA Sequence Originally2 methods were invented around 1976, but only one is widely used: the chain-termination method invented by Fred Sanger.  The other method is Maxam-Gilbert chemical degradation method, which is still used for specialized purposes, such as analyzing DNA- protein interactions. More recently, several cheaper and faster alternatives have been invented. It is hard to know which of these methods, or possibly another method, will ultimately become standard. We will discus two of them: 454 pyrosequencing and Illumina/Solexa sequencing
  • 12.
    Sanger Sequencing  UsesDNA polymerase to synthesize a second DNA strand that is labeled. DNA polymerase always adds new bases to the 3’ end of a primer that is base- paired to the template DNA.  DNA polymerase is modified to eliminate its editing function  Also uses chain terminator nucleotides: dideoxy nucleotides (ddNTPs), which lack the -OH group on the 3' carbon of the deoxyribose. When DNA polymerase inserts one of these ddNTPs into the growing DNA chain, the chain terminates, as nothing can be added to its 3' end.
  • 13.
    Sequencing Reaction  Thetemplate DNA is usually single stranded DNA, which can be produced from plasmid cloning vectors that contain the origin of replication from a single stranded bacteriophage such as M13 or fd. The primer is complementary to the region in the vector adjacent to the multiple cloning site.  Sequencing is done by having 4 separate reactions, one for each DNA base.  All 4 reactions contain the 4 normal dNTPs, but each reaction also contains one of the ddNTPs.  In each reaction, DNA polymerase starts creating the second strand beginning at the primer.  When DNA polymerase reaches a base for which some ddNTP is present, the chain will either:  terminate if a ddNTP is added, or:  continue if the corresponding dNTP is added.  which one happens is random, based on ratio of dNTP to ddNTP in the tube.  However, all the second strands in, say, the A tube will end at some A base: you get a collection of DNAs that end at each of the A's in the region being sequenced.
  • 15.
    Electrophoresis  The newlysynthesized DNA from the 4 reactions is then run (in separate lanes) on an electrophoresis gel.  The DNA bands fall into a ladder- like sequence, spaced one base apart. The actual sequence can be read from the bottom of the gel up.  Automated sequencers use 4 different fluorescent dyes as tags attached to the dideoxy nucleotides and run all 4 reactions in the same lane of the gel.  Today’s sequencers use capillary electrophoresis instead of slab gels.  Radioactive nucleotides (32 P) are used for non-automated sequencing.  Sequencing reactions usually produce about 500-1000 bp of good sequence.
  • 16.
    Next Generation Sequencing Recently a number of faster and cheaper sequencing methods have been developed.  The Archon X prize (2006): "the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 100,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $10,000 (US) per genome.”  Currently there is a push for (and NIH grant money for) developing a method that will sequence the entire human genome for $1000, to allow personal genomics.  One of the most widely used new methods involve the pyrosequencing biochemical reactions (invented by Nyren and Ronaghi in 1996), with the massively parallel microfluidics technology invented by the 454 Life Sciences company. We can call this combined technology “454 sequencing”.  Applications:  sequencing of whole bacterial genomes in a single run  sequencing genomes of individuals  metagenomics: sequencing DNA extracted from environmental samples  looking for rare variants in a single amplified region, in tumors or viral infections  transcriptome sequencing: total cellular mRNA converted to cDNA.
  • 17.
    Pyrosequencing Biochemistry  InDNA synthesis, a dNTP is attached to the 3’ end of the growing DNA strand. The two phosphates on the end are released as pyrophosphate (PPi).  ATP sulfurylase uses PPi and adenosine 5’- phosphosulfate to make ATP.  ATP sulfurylase is normally used in sulfur assimilation: it converts ATP and inorganic sulfate to adenosine 5’- phosphosulfate and PPi. However, the reaction is reversed in pyrosequencing.  Luciferase is the enzyme that causes fireflies to glow. It uses luciferin and ATP as substrates, converting luciferin to oxyluciferin and releasing visible light.  The amount of light released is proportional to the number of nucleotides added to the new DNA strand.  After the reaction has completed, apyrase is added to destroy any leftover dNTPs.
  • 18.
    Sequence Assembly  DNAis sequenced in very small fragments: at most, 1000 bp. Compare this to the size of the human genome: 3,000,000,000 bp. How to get the complete sequence?  In the early days (1980’s), genome sequencing was done by chromosome walking (aka primer walking): sequence a region, then make primers from the ends to extend the sequence. Repeat until the target gene was reached.  The cystic fibrosis gene was identified by walking about 500 kbp from a closely linked genetic marker, a process that took a long time and was very expensive.  Still useful for fairly short DNA molecules, say 1-10 kbp.
  • 19.
    Shotgun Sequencing  Shotgunsequencing is what is typically done today: DNA is fragmented randomly and enough fragments are sequenced so each base is read 10 times or more on average. The overlapping fragments (“reads”) are then assembled into a complete sequence.  For large genomes, hierarchical shotgun sequencing is a useful technique: first break up the genome into an ordered set of cloned fragments (scaffolds), usually BAC clones. Each BAC is shotgun sequenced separately.
  • 20.
    Shotgun Sequencing Used to sequence wholegenomes Steps: DNA is broken up randomly into smaller fragments Dideoxy method produces reads Look for overlap of reads Strand Sequence First Shotgun Sequence AGCATGCTGCAGTCATGCT------- -------------------TAGGCTA Second Shotgun Sequence AGCATG-------------------- ------CTGCAGTCATGCTTAGGCTA Reconstruction AGCATGCTGCAGTCATGCTTAGGCTA
  • 21.
    Human Genome Project Beganin 1990 Why? Human evolution Nature versus nurture Causes of disease
  • 22.
    2nd Generation: Pyrosequencing Sequencing bysynthesis Advantages: Accurate Parallel processing Easily automated Eliminates the need for labeled primers and nucleotides No need for gel electrophoresis
  • 23.
    PyrosequencingBasic idea: Visible lightis generated and is proportional to the number of incorporated nucleotides 1pmol DNA = 6*1011 ATP = 6*109 photons at 560nm DNA Polymerase I from E.coli. pyrophospate From fireflies, oxidizes luciferin and generates light
  • 24.
    Pyrosequencing  2nd Method Liquid Phase ○3 enzymes + apyrase (nucleotide degradation enzyme)  Eliminates need for washing step • In the well of a microtiter plate: • primed DNA template • 4 enzymes • Nucleotides are added stepwise • Nucleotide-degrading enzyme degrade previous nucleotides
  • 25.
  • 26.
  • 27.
    Summary  DNA sequencingis a common procedure  Dideoxy method Chain termination method Best for small DNA segments  Whole genome shotgun sequencing Sequence human genome Fragments larger DNA strand to manageable chunks  Pyrosequencing Sequence by synthesis Accurate and fast