Thank you for joining us in our tech talk series. Today’s tech talk will be an overview of next generation sequencing platforms.
I will start of by giving a brief history on the origin of next generation sequencing. I will then cover some of the common steps in doing next-gen sequening. We will then cover in more detail the specific technologies behind some of the more popular sequencing platforms we use as well discuss some of the new comers in the sequencing Market. These next generation sequencing technologies have been mostly developed and sold by individual companies. I am however not affiliated with any of them. I will try to present both the advantages and disadvantage of each system.
Although it has become quite common to do nex-gen sequencing studies many biomedical research labs, the technologies behind the data have really only come about in the past decade. The story of next generation sequencing start with the Human Genome Project.2000 First draft of Human Genome Complete. - This set the roadmap for modern biomedical research. - Most of the DNA sequencing for the Human Genome Project was based on Sanger Sequencing. - Sanger Sequencing methods were slow and expensive. Estimated cost of the human genome project is 3 billion dollars and 10 years to generate the first draft of Human Genome. - After the completion of the first Human Genome there was an increased focus on improving the machinery used to generate sequencing data.Introduction of Next Generation Sequencing Instruments. - Next Generation Sequencing refers to the advancement of sequencing technologies that allowed for significant improvements in quantity, speed, and cost of generating sequencing data. - In 2004 first parallelized version of 454 pyrosequencing instrument was commercialize. 6-fold cost reduction compared to capillary based Sanger Sequencing. - Since then a wide range of similar sequencing technologies have been introduced bringing down the cost even further. - We are now getting very close to the $1000 per sample mark for Whole Genome Sequencing.
Single-molecule template sequencing avoids some of these problems by circumventing the amplification step all together. I will go over an example of a platfDNA amplification results in multiple copies of the initial DNA fragment. For most NGS platforms, except for single molecule sequencing instruments, this is necessary because sequencing takes places on all the fragments at the same time. The sequencing signal from all the fragments are read in at once. This is also the main reason why the read lengths of most NGS platforms are short. Incomplete extension or over extension of the sequencing stand can result in strand dephasing which causes noise in the detection signal. Accumulation of dephased strands deceases the signal to noise ratio to a point where the sequence can no longer be read causing the termination of the sequencing read. Real-time sequencers can avoid this problem leading to much longer sequencing reads, but they suffer from other limitations ( which I will go over later. ) ormthat uses single-molecule sequencing.
The main purpose for dna amplification is so that you can sequence all the fragments at once. This is essentially what sets Next generation sequencing apart from Sanger. In Sanger …
SOLiD has lowest error rate. Good for resequencing.Illumina has the highest throughput / cost.PacBio has the longest read length. Up to 15,000 bp. Applications in DeNovo sequencing and structural variation studies.Ion Torrent from Life Technologies has two instruments. The Personal Genome Machine which has been out since 2010. It’s mostly only applicable for small genome sequencing like bacterial genomes. The Ion Proton, which promises much greater throughput and it has been recently made commercially available. There is however limited information on it’s actual performance.
Emulsion PCR Fragment DNA and attache adapters.PCR is carried out in a oil mixture containing beads and reagents.Each of the beads has a small adaptor that is complementary to the adaptor attached to the DNA fragment.Once the DNA is added to the emulsion ideally you get one fragment attached to each bead.You then carry out the PCR reaction which results in multiple clonal DNA fragments on each bead.The beads are then lowered on plates containing millions of wells roughly the size of each bead.
Additional beads are added to each well containing Sulphurulase and Luciferase.Reagents are flowed through the top of the plates for the sequencing reaction to occur. A key step to the process is that only one nucleotide type (ex: cytosine) is added to the plate.If the added nucleotide is complementary to the template strand it is incorporated into the growing strand by the polymerase releasing a pyrophosphate.The pyrophosphate is then picked up by the catalytic enzymes causing a chain reaction which ultamately emits light.The light is then picked up by CCD camera and recoded on a flowgram.The cycle repeated between reagent and washing.A key limitation to this procedure is that if you have a chain of identical nucleotides on the template sequences you will get the addition of multiple nucleotides in a single reaction cycle. The light intensity will be proportional to the number of nucleotides added but the detection limit is around 6 or 7. The camera will therefore not be able to detect homopolymer runs that are longer than 6 or 7 nucleotides.The most common error rate in pyrosequencing is therefore indels which is around 1%. Substitution error rate however are very low which makes this platform great for targeted resequencing and studies where you are looking for single nucleotide substitutions.
Illumina Library Prep is done through a process called bridge amplification.1. PCR is done on glass plate called flow cells.2. DNA is fragmented and ligate adaptors are added to both ends similar to emulsion PCR.3. Single stranded fragments are attached to surface of flowcells. The flowcell contains the complementary primers to the fragment adaptors.4. PCR reaction is carried out which extents the bridge.5. Through the PCR cycle multiple single strand copies are made creating a dense cluster on the flowcell.
Improve chemistry and detection for Illumina HiSeq/MiSeq
Summary – Due to the 2-base encoding system you get very low error rate. (Error rate of around .01)It is still however limited by short reads. (35bp)
Obvious advantage is no need for clonal amplification. Reduces amplification bias.
Indels errors occur when multiple nucleotide are incorporated too fast for the camera to detect.Improvments to the technology primarily have come with advances in it’s chemistry. Primarily it’s polymerase enzyme. Improvements include higher nucleotide incorporation accuracy. Higher tolerance to heat ( longer reads ).
A Comparison of NGS Platforms.
NGS Overview Part I:A Comparison of Next-Generation Sequencing Platforms. MIN SOO KIM APRIL 1, 2013 QUANTITATIVE BIOMEDICAL RESEARCH CENTER DEPARTMENT OF CLINICAL SCIENCES UT SOUTHWESTERN
OUTLINE Background. Common Pipeline. Library Prep. Sequencing – Massively Parallel Sequencing. Bioinformatics - Data Analysis . Popular Platforms: Roche 454 , AB SOLiD, Illumina(HiSeq,MiSeq). Newer Platforms (Third Generation): Ion Torrent, PacBio RS, Oxford Nanopore.
Next Generation Sequencing PipelineDNA Sample NGS Instrument Data Library Data Sequencing Preparation Analysis
Library Preparation DNA samples are randomly fragmented and platform-specific adaptors are added to the flanking ends to produce a “library”. Library is then amplified through PCR. (Platform- specific amplification e.g. beads or glass) Amplification Introduces Bias: Amplification bias against AT, GC rich regions. (corrected by adding PCR additives.) Alteration of representational abundances(duplicates). Important for quantitative applications like RNA-seq. Overrepresentation of smaller fragments. (corrected by running fewer PCR cycles.)
Massively Parallel Sequencing In Sanger-sequencing the DNA synthesis and detection steps are two separate procedures (slow). Next generation sequencing relies on coupling the DNA synthesis and detection (sequencing by synthesis) and multiple sequencing reactions are run simultaneously (Massively Parallel Sequencing). For most NGS platforms desynchronization of reads during the sequencing and detection cycle is the main cause of sequencing errors and shorter reads.
Platform Chemistry Read Run Gb/Run Advantage Disadvantage Length Time 454 GS Pyro- 500 8 hrs. 0.04 Long Read High error rate Junior sequencing Length in (Roche) homopolymer 454 GS Pyro- 700 23 hrs. 0.7 Long Read High error rate FLX+ sequencing Length in (Roche) homopolymer HiSeq Reversible 2*100 2 days 120 High- Short reads(Illumina) Terminator (rapid (rapid throughput Long run time mode) mode) / cost (normal mode) SOLiD Ligation 85 8 days 150 Low Error Short reads (Life) Rate Long run timeIon Proton Proton 200 2 hrs. 100 Short Run New* (Life) Detection times 3000 No PCRPacBio RS Real-time (up to 20 min 3 Longest High Error Sequencing 15,000) Read Rate Length
Roche 454 – Library Prep.Emulsion PCR (emPCR)1. DNA fragmentations and adaptor ligation.2. DNA fragments are added to an oil mixture containing millions of beads.3. Emulsion PCR results in multiple copies of the fragment.4. Beads are deposited on plate wells ready for sequencing.
Illumina – Sequence by Synthesis 1. Add dye-labeled nucleotides. 2. Scan and detect nucleotide specific fluorescence. 3. Remove 3’ – blocking group (Reversible termination). 4. Cleave fluorescent group. 5. Rinse and Repeat.http://hmg.oxfordjournals.org
Illumina – Sequencing by Synthesis Platform Run Yield Error Error Time (GB) Type Rate GAIIx 14 days 96 Sub 0.1 HiSeq 10/2 600/120 Sub 0.03* 2000/2500 days MiSeq 1 day 2 Sub 0.03*
IlluminaAdvantages High throughput / cost. Suitable for a wide rage of applications most notably whole genome sequencing.Disadvantages Substitution error rates (recently improved). Lagging strand dephasing causes sequence quality deterioration towards the end of read.
SOLiD (Life/AB) SOLiD: Sequencing by Oligonucleotide Ligation and Detection. Library Prep: emPCR “2-base encoding” – Instead of the typical single dNTP addition, two base matching probes are used. (possible 16 probes). Color Space – four color sequencing encoding further increases accuracy.
SOLiD (Life/AB) 1. Anneal primer and hybridize probe(8-mer). 2. Ligation and Detection. 3. Cleave fluorescent tail(3-mer). 4. Repeat ligation cycle. Repeat steps 1-4 with (n-1) primer.
SOLiD – Color Space • 16 possible base combination are represented by 4 colors. • All possible sequencing combination need to be decoded.
SOLiD – SNP DetectionSummary: SOLiD has one of the lowest error-rates (~0.01) due to 2-baseencoding. It is however still limited by short read lengths ( 35 bp / 85 bp for PE).
Ion Torrent (Life Technologies) • Similar to pyrosequencing but uses semiconducting chip to detect dNTP incorporation. • The chip measure differences in pH. • Shown to have problems with homopolymer reads and coverage bias with GC-rich regions. • Ion Proton™ promises higher output and longer reads.J. M. Rothberg, et al. Nature (2011) 475:348-352
PacBio RS Single Molecule Sequencing – instead of sequencing clonally amplified templates from beads (Pyro) or clusters (Illumina) DNA synthesis is detected on a single DNA strand. Zero-mode waveguide (ZMW) • DNA polymerase is affixed to the bottom of a tiny hole (~70nm). • Only the bottom portion of the hole is illuminated allowing for detection of incorporation of dye-labeled nucleotide.
PacBio RS Real-time Sequencing – Unlike reversible termination methods (Illumina) the DNA synthesis process is never halted. Detection occurs in real-time. Library Prep. • DNA template is circularized by the use of “bell” shaped adapters. • As long as the polymerase is stable this allows for continuous sequencing of both strands.
PacBio RSAdvantages No amplification required. Extremely long read lengths. Average 2500 bp. Longest 15,000bp.Disadvantages High error rates. Error rate of ~15% for Indels. 1% Substitutions.
Oxford Nanopore Announced Feb. 2012 at ABGT conference. Measure changes in ion flow through nanopore. Potential for long read lengths and short sequencing times.http://www2.technologyreview.com
Thank you.AcknowledgementsDr. Tae Hyun HwangReferencesMardis, E. R. A decade’s perspective on DNA sequencing technology.Nature (2011) 470:198 - 203Metzker, M.L. Sequencing technologies – the next generation. NatureReview Genetics(2010) 11:31 – 46N. J. Loman, C. Constantinidou, Jacqueline Z. M. Chan, M. Halachev, M.Sergeant, C. W. Penn, E. R. Robinson & M. J. Pallen. High-throughputbacterial genome sequencing: an embarrassment of choice, a world ofopportunity. Nature Review Microbiology 10, 599-606.