Next Gen Sequencing Platforms and
     Applications

                Matthew Tinning
                Australian Genome Research Facility




1 August 2012
Next-Gen Sequencing
    Technologies



Roche GS-FLX      Life Technologies SOLiD




Illumina HiSeq   Life Technologies Ion Torrent
Roche GS-FLX
Workflow
Sample Fragmentation


Library Preparation


emPCR Setup


emPCR Amplification


Pyrosequencing


Data Analysis
Pyrosequencing
emPCR
Emulsion PCR is a method of clonal amplification which allows
  for millions of unique PCRs to be performed at once through
  the generation of micro-reactors.
emPCR




The Water-in-Oil-Emulsion
Massively Parallel Sequencing
Data Analysis
          T Base    A Base    C Base     G Base
           Flow      Flow      Flow       Flow




                   Raw Image Files




             Image        Base-        Quality
           Processing     calling      Filtering




                        SFF File
454 Platform Updates

       GS20          • 100bp reads, ~20Mbp / run

      GS-FLX         • 250bp reads ~100 Mbp / run (7.5 hrs)

  GS-FLX Titanium    • 400bp reads ~400 Mbp / run (10 hrs)

GS-FLX Titanium Plus • 700 bp reads ~700 Mbp/run (18 hrs)

     GS Junior       • 400 bp reads ~ 35Mbp/run (10 hrs)
454 Sequencing Output
• *.sff (standard flowgram format)
• *.fna (fasta)
• *.qual (Phred quality scores)
Illumina HiSeq
Illumina Sequencing Technology
                       Robust Reversible Terminator Chemistry Foundation
                                                                                   3’ 5’



        DNA
    (0.1-1.0 ug)                                                    A       G
                                                                                        T
                                                                    C                   G
                                                                            A
                                                                                        C
                                                                        T               T
                                                                                        A
                                                                            C           C
                                                                                        G
                                                                     G                  A
                                                                                        T
                                                                            A           A
                                                                                        C
                                                                    T                   C
                                                                                        C
                                                                    C       G           G
                                                                                        A
                                                                                        T
                                                                        T               C
      Sample                                                                            G
                                                                                        A
    preparation                          Cluster growth                                 T
                                                                                   5’

                                                                                Sequencing
1        2         3      4    5     6      7      8      9
                                                                T G C T A C G A T …

                                                                   Base calling
Image acquisition
Platform Updates
      Solexa 1G           • 18bp reads, ~1Gbp / run

      Illumina GA         • 36bp reads ~3Gbp / run

     Illumina GAII        • 75bp paired reads ~10Gbp / run (8 days)

    Illumina GAIIx        • 75bp paired reads ~40Gbp / run (8 days)

 Illumina HiSeq 2000      • 100 bp paired reads ~200 Gbp/ run (10 days)

Illumina HiSeq, v3 SBS • 100bp paired reads ~600Gbp / run (12 days)
         MiSeq            • 150 paired reads ~1.5 Gb/run (27 hrs)


   Maximum yield / day 50,Gbp
   ~16x the human genome
Illumina Sequencing Output
• *.fastq (sequence and corresponding quality
  score encoded with an ASCII character, phred-
  like quality score + 33)
Illumina fastq
         1               2          3      4     5     67         8
@HWI-ST226:253:D14WFACXX:2:1101:2743:29814 1:N:0:ATCACG
TGCGGAAGGATCATTGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTT
GAAAAAAAAAAAAAAAAAATTA
+
B@CFFFFFHHFFHJIIGHIHIJJIJIIJJGDCHIIIJJJJJJJGJGIHHEH@)=F@EIGHHEHFFFFDCBBD:@CC@C
:<CDDDD50559<B########


 1.   unique instrument ID and run ID
 2.   Flow cell ID and lane
 3.   tile number within the flow cell lane
 4.   'x'-coordinate of the cluster within the tile
 5.   'y'-coordinate of the cluster within the tile
 6.   the member of a pair, /1 or /2 (paired-end or mate-pair reads only)
 7.   N if the read passes filter, Y if read fails filter otherwise
 8.   Index sequence
Applied Biosystems SOLiD
Sequencing by Ligation
Base Interrogations
2 Base encoding



         AT
emPCR and Enrichment




3’ Modification allows covalent bonding to the slide surface
Platform Updates
                          • 50bp Paired reads ~50Gbp / run
  SOLiD 3                   (12 days)

                          • 50bp Paired reads ~100Gbp / run
  SOLiD 4                   (12 days)

                          • 75bp Paired reads ~300Gbp / run
   5500xl                   (14 days)

Maximum yield / day 21,000,000,000bp
7x the human genome
3.5 hours of sequencing for a 1 fold coverage.....
SOLiD Colour Space Reads

• *.csfasta (colour space fasta)
• *.qual (Phred quality scores)
       >853_17_1660_F3
       T32111011201320102312......

  AA      CC   GG    TT    0   Blue
  AC      CA   GT    TG    1   Green
  AG      CT   GA    TC    2   Yellow
  AT      CG   GC    TA    3   Red
Applied Biosystems:
 Ion Torrent PGM
Ion Torrent


• Ion Semiconductor Sequencing
• Detection of hydrogen ions during
the polymerization DNA
• Sequencing occurs in microwells
with ion sensors
• No modified nucleotides
• No optics
Ion Torrent
 dNTP                                      • DNA Ions  Sequence
                                           – Nucleotides flow sequentially over Ion
                                             semiconductor chip
                          H+               – One sensor per well per sequencing
                                             reaction
                                    ∆ pH   – Direct detection of natural DNA extension
                                           – Millions of sequencing reactions per chip
                                   ∆Q      – Fast cycle time, real time detection



Sensing Layer
           Sensor Plate
                                   ∆V



Bulk      Drain     Source     To column
                               receiver
Silicon Substrate
Ion Torrent: System Updates

314 Chip   • 100bp reads ~10 Mb/run (1.5 hrs)


           • 100 bp reads ~100 Mbp / run (2 hrs)
316 Chip   • 200 bp reads ~200 Mbp/run (3 hrs)


318 Chip   • 200 bp reads ~1 Gbp / run (4.5 hrs)
Ion Torrent Reads
• *.sff (standard flowgram format)
• *.fastq (sequence and corresponding quality
  score encoded with an ASCII character, phred-
  like quality score + 33)
Summary of NGS Platforms
• Clonal amplification of sequencing template
   – emPCR (454, SOLiD and Ion Torrent)
   – Bridge amplification (Illumina)
• Sequencing by Synthesis
   – 454 Pyrosequencing
   – Illumina Reversible Terminator Chemistry
   – Ion Torrent Ion Semiconductor Sequencing
• Sequencing by ligation
   – SOLiD – 2 base encoding
• Dramatic reduction in cost of sequencing
   – GS-FLX provides > 100x decrease in costs compared to
     Sanger Sequencing
   – HiSeq and SOLiD > 100x decrease in costs over GS-FLX
Applications


• DNA
   • Whole Genome
        – Shotgun & Mate Pair
   • Sequence Capture
   • Amplicon
• RNA
   • mRNA
   • small RNA
Next Gen Sequencing Library
        Preparation
Sample preparation

    mRNA                                 DNA

         chemical
                                           mechanical
Fragmentation


cDNA Synthesis                     Fragmentation




            Ligation of Amplification/
            Sequencing Adaptors


         Library Fragment Size Selection
Shotgun Libraries

• Illumina
   – Input: 1 ug of DNA
   – Fragmentation w/ Covaris
   – Size Selection w/ gel excission
       • Insert Size 300-400 bp
       • gel free method for captures
   – PCR “enrichment” (10 cycles)


• 454
   – Input 500 ng of DNA
   – Fragmentation w/ Nebulization
   – Small fragment removal (AMpure
     size exclusion)
       • Library size ~900 bp
Mate-Pair Libraries

•   Mate pair libraries for scafolding and
    structural variation
     – Input: 5-20 ug of DNA
     – 3kb, 8kb and 20Kb inserts
     – Size Select via gel electrophoresis
     – Adaptors for circularization via Cre
         recombinase (454)
     – PCR amplification (20 cycles)
Sequence Capture


•   Enrichment for specific targets via
    capture with oligonculeotide baits
     – Exome Capture
         • TruSeq Exome 62 Mb
         • NimbleGen SeqCap EZ Exome
           Library v2 & v4
         • Agilent SureSelect XT/2 All Exon
           v4 (+UTRS)
     – Custom Capture
         • TruSeq Custom Enrichment (700
           Kb- 15 Mb)
         • NimbleGen SeqCap EZ Choice (up
           to 50 Mb)
         • Agilent SureSelect XT/2 Custom
           (up to 34 Mb)
RNA-seq (cDNA libraries)

•   Shotgun library of cDNA

     – Isolation of Poly(A) RNA
     – (100 ng – 4 ug of total RNA)
     – Chemical Fragmentation of RNA
     – Random primed cDNA Synthesis &
       2nd strand Synthesis
     – Follows standard “DNA” library
       protocol
Illumina small RNA

•   Illumina Small RNA Sample
    Preparation
     – Input: 1-10 ug of total RNA
         • 50-200 ng of small RNA
     – RNA-adaptor ligation before cDNA
       synthesis
     – Small RNA size selection via PAGE
         • Library fragment ~145-160bp
           (insert 20-33 nucleotides)
     – PCR “amplification” (11 cycles)
Sample requirements


DNA – OD260/280 1.8-2.0   RNA – RIN > 8.0

gDNA                      1 µg (Illumina)
                          500 ng (454)
                          5-20 ug (454 Paired-End)
Total RNA                 100 ng- 4 µg (mRNA-seq)
                          1-10 ug (small RNA)
mRNA                      10-100 ng (Illumina)
                          200 ng (454)
small RNA                 50-200 ng

NGS technologies - platforms and applications

  • 1.
    Next Gen SequencingPlatforms and Applications Matthew Tinning Australian Genome Research Facility 1 August 2012
  • 2.
    Next-Gen Sequencing Technologies Roche GS-FLX Life Technologies SOLiD Illumina HiSeq Life Technologies Ion Torrent
  • 3.
  • 4.
    Workflow Sample Fragmentation Library Preparation emPCRSetup emPCR Amplification Pyrosequencing Data Analysis
  • 5.
  • 6.
    emPCR Emulsion PCR isa method of clonal amplification which allows for millions of unique PCRs to be performed at once through the generation of micro-reactors.
  • 7.
  • 8.
  • 9.
    Data Analysis T Base A Base C Base G Base Flow Flow Flow Flow Raw Image Files Image Base- Quality Processing calling Filtering SFF File
  • 10.
    454 Platform Updates GS20 • 100bp reads, ~20Mbp / run GS-FLX • 250bp reads ~100 Mbp / run (7.5 hrs) GS-FLX Titanium • 400bp reads ~400 Mbp / run (10 hrs) GS-FLX Titanium Plus • 700 bp reads ~700 Mbp/run (18 hrs) GS Junior • 400 bp reads ~ 35Mbp/run (10 hrs)
  • 11.
    454 Sequencing Output •*.sff (standard flowgram format) • *.fna (fasta) • *.qual (Phred quality scores)
  • 12.
  • 13.
    Illumina Sequencing Technology Robust Reversible Terminator Chemistry Foundation 3’ 5’ DNA (0.1-1.0 ug) A G T C G A C T T A C C G G A T A A C T C C C G G A T T C Sample G A preparation Cluster growth T 5’ Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T … Base calling Image acquisition
  • 14.
    Platform Updates Solexa 1G • 18bp reads, ~1Gbp / run Illumina GA • 36bp reads ~3Gbp / run Illumina GAII • 75bp paired reads ~10Gbp / run (8 days) Illumina GAIIx • 75bp paired reads ~40Gbp / run (8 days) Illumina HiSeq 2000 • 100 bp paired reads ~200 Gbp/ run (10 days) Illumina HiSeq, v3 SBS • 100bp paired reads ~600Gbp / run (12 days) MiSeq • 150 paired reads ~1.5 Gb/run (27 hrs) Maximum yield / day 50,Gbp ~16x the human genome
  • 15.
    Illumina Sequencing Output •*.fastq (sequence and corresponding quality score encoded with an ASCII character, phred- like quality score + 33)
  • 16.
    Illumina fastq 1 2 3 4 5 67 8 @HWI-ST226:253:D14WFACXX:2:1101:2743:29814 1:N:0:ATCACG TGCGGAAGGATCATTGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTT GAAAAAAAAAAAAAAAAAATTA + B@CFFFFFHHFFHJIIGHIHIJJIJIIJJGDCHIIIJJJJJJJGJGIHHEH@)=F@EIGHHEHFFFFDCBBD:@CC@C :<CDDDD50559<B######## 1. unique instrument ID and run ID 2. Flow cell ID and lane 3. tile number within the flow cell lane 4. 'x'-coordinate of the cluster within the tile 5. 'y'-coordinate of the cluster within the tile 6. the member of a pair, /1 or /2 (paired-end or mate-pair reads only) 7. N if the read passes filter, Y if read fails filter otherwise 8. Index sequence
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    emPCR and Enrichment 3’Modification allows covalent bonding to the slide surface
  • 22.
    Platform Updates • 50bp Paired reads ~50Gbp / run SOLiD 3 (12 days) • 50bp Paired reads ~100Gbp / run SOLiD 4 (12 days) • 75bp Paired reads ~300Gbp / run 5500xl (14 days) Maximum yield / day 21,000,000,000bp 7x the human genome 3.5 hours of sequencing for a 1 fold coverage.....
  • 23.
    SOLiD Colour SpaceReads • *.csfasta (colour space fasta) • *.qual (Phred quality scores) >853_17_1660_F3 T32111011201320102312...... AA CC GG TT 0 Blue AC CA GT TG 1 Green AG CT GA TC 2 Yellow AT CG GC TA 3 Red
  • 24.
  • 25.
    Ion Torrent • IonSemiconductor Sequencing • Detection of hydrogen ions during the polymerization DNA • Sequencing occurs in microwells with ion sensors • No modified nucleotides • No optics
  • 26.
    Ion Torrent dNTP • DNA Ions  Sequence – Nucleotides flow sequentially over Ion semiconductor chip H+ – One sensor per well per sequencing reaction ∆ pH – Direct detection of natural DNA extension – Millions of sequencing reactions per chip ∆Q – Fast cycle time, real time detection Sensing Layer Sensor Plate ∆V Bulk Drain Source To column receiver Silicon Substrate
  • 27.
    Ion Torrent: SystemUpdates 314 Chip • 100bp reads ~10 Mb/run (1.5 hrs) • 100 bp reads ~100 Mbp / run (2 hrs) 316 Chip • 200 bp reads ~200 Mbp/run (3 hrs) 318 Chip • 200 bp reads ~1 Gbp / run (4.5 hrs)
  • 28.
    Ion Torrent Reads •*.sff (standard flowgram format) • *.fastq (sequence and corresponding quality score encoded with an ASCII character, phred- like quality score + 33)
  • 29.
    Summary of NGSPlatforms • Clonal amplification of sequencing template – emPCR (454, SOLiD and Ion Torrent) – Bridge amplification (Illumina) • Sequencing by Synthesis – 454 Pyrosequencing – Illumina Reversible Terminator Chemistry – Ion Torrent Ion Semiconductor Sequencing • Sequencing by ligation – SOLiD – 2 base encoding • Dramatic reduction in cost of sequencing – GS-FLX provides > 100x decrease in costs compared to Sanger Sequencing – HiSeq and SOLiD > 100x decrease in costs over GS-FLX
  • 30.
    Applications • DNA • Whole Genome – Shotgun & Mate Pair • Sequence Capture • Amplicon • RNA • mRNA • small RNA
  • 31.
    Next Gen SequencingLibrary Preparation
  • 32.
    Sample preparation mRNA DNA chemical mechanical Fragmentation cDNA Synthesis Fragmentation Ligation of Amplification/ Sequencing Adaptors Library Fragment Size Selection
  • 33.
    Shotgun Libraries • Illumina – Input: 1 ug of DNA – Fragmentation w/ Covaris – Size Selection w/ gel excission • Insert Size 300-400 bp • gel free method for captures – PCR “enrichment” (10 cycles) • 454 – Input 500 ng of DNA – Fragmentation w/ Nebulization – Small fragment removal (AMpure size exclusion) • Library size ~900 bp
  • 34.
    Mate-Pair Libraries • Mate pair libraries for scafolding and structural variation – Input: 5-20 ug of DNA – 3kb, 8kb and 20Kb inserts – Size Select via gel electrophoresis – Adaptors for circularization via Cre recombinase (454) – PCR amplification (20 cycles)
  • 35.
    Sequence Capture • Enrichment for specific targets via capture with oligonculeotide baits – Exome Capture • TruSeq Exome 62 Mb • NimbleGen SeqCap EZ Exome Library v2 & v4 • Agilent SureSelect XT/2 All Exon v4 (+UTRS) – Custom Capture • TruSeq Custom Enrichment (700 Kb- 15 Mb) • NimbleGen SeqCap EZ Choice (up to 50 Mb) • Agilent SureSelect XT/2 Custom (up to 34 Mb)
  • 36.
    RNA-seq (cDNA libraries) • Shotgun library of cDNA – Isolation of Poly(A) RNA – (100 ng – 4 ug of total RNA) – Chemical Fragmentation of RNA – Random primed cDNA Synthesis & 2nd strand Synthesis – Follows standard “DNA” library protocol
  • 37.
    Illumina small RNA • Illumina Small RNA Sample Preparation – Input: 1-10 ug of total RNA • 50-200 ng of small RNA – RNA-adaptor ligation before cDNA synthesis – Small RNA size selection via PAGE • Library fragment ~145-160bp (insert 20-33 nucleotides) – PCR “amplification” (11 cycles)
  • 38.
    Sample requirements DNA –OD260/280 1.8-2.0 RNA – RIN > 8.0 gDNA 1 µg (Illumina) 500 ng (454) 5-20 ug (454 Paired-End) Total RNA 100 ng- 4 µg (mRNA-seq) 1-10 ug (small RNA) mRNA 10-100 ng (Illumina) 200 ng (454) small RNA 50-200 ng