 Next-Generation sequencing (NGS)
technologies – overview
 NGS targeted re-sequencing – fishing out the
regions of interest
 NGS workflow: data collection and processing
– the exome sequencing pipeline
Next-Generation sequencing
(NGS) technologies – overview
 The automated
Sanger method is
considered as a ‘first-
generation’
technology, and
newer methods are
referred to as next-
generation
sequencing (NGS).
 1953 Discovery of DNA double helix structure
 1977
◦ A Maxam and W Gilbert "DNA seq by chemical degradation"
◦ F Sanger"DNA sequencing with chain-terminating inhibitors"
 1984 DNA sequence of the Epstein-Barr virus, 170 kb
 1987 Applied Biosystems - first automated sequencer
 1991 Sequencing of human genome in Venter's lab
 1996 P. Nyrén and M Ronaghi - pyrosequencing
 2001 A draft sequence of the human genome
 2003 human genome completed
 2004 454 Life Sciences markets first NGS machine
Random
genome
sequencing
• 25 Mb
• 300k reads
• 110bp
Sanger
sequencing
• Targeted
• 700-1000 bp
 The newer technologies constitute various
strategies that rely on a combination of
◦ Library/template preparation
◦ Sequencing and imaging
 Commercially available technologies
◦ Roche – 454
 GSFLX titanium
 Junior
◦ Illumina
 HiSeq2000
 MySeq
◦ Life – SOLiD
 5500xl
 Ion torrent
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS
 Produce a non-biased source of nucleic acid
material from the genome
 Produce a non-biased source of nucleic acid
material from the genome
 Produce a non-biased source of nucleic acid
material from the genome
 Current methods:
◦ randomly breaking genomic DNA into smaller sizes
◦ Ligate adaptors
◦ attach or immobilize the template to a solid surface
or support
◦ the spatially separated template sites allows
thousands to billions of sequencing reactions to be
performed simultaneously
 Clonal amplification
◦ Roche – 454
◦ Illumina – HiSeq
◦ Life – SOLiD
 Single molecule sequencing
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS
 In solution – emulsion PCR (emPCR)
◦ Roche – 454
◦ Life – SOLiD
 Solid phase – Bridge PCR
◦ Illumina – HiSeq
SOLiD 454
Picotitre plate Pyrosequencing
Heliscope BioPac
HiSeq Heliscope
 The major advance offered by NGS is the
ability to cheaply produce an enormous
volume of data
 The arrival of NGS technologies in the
marketplace has changed the way we think
about scientific approaches in basic, applied
and clinical research
fishing out the regions of
interest
Random
genome
sequencing
??? ??? Sanger
sequencing
•Targeted
•700-1000
bp
 Library/template preparation
 Library enrichment for target
 Sequencing and imaging
Random
genome
sequencing
Hybrid
Capture
PCR based Sanger
sequencing
In solution
•Agilent
•Nimblegen
•...
Solid phase
•Agilent
•Nimblegen
•Febit
•...
In solution
• Relatively cheap
• High throughput is
possible
• Small amounts of DNA
sufficient
Solid phase
• Straightforward method
• Flexible
• Higher amounts of DNA
•Uniplex
•Multiplex
•Fluidigm
•Raindance
•Multiplicon
•Longrange PCR products
•Raindance
• 48.48 Access Array
• 48.48 Access Array
• 48.48 Access Array
data collection and processing
– the exome sequencing
pipeline
 The human genome
◦ Genome = 3Gb
◦ Exome = 30Mb
◦ 180 000 exons
 Protein coding genes
◦ constitute only approximately 1% of the human
genome
◦ It is estimated that 85% of the mutations with large
effects on disease-related traits can be found in
exons or splice sites
gDNA
3 Gb
Exome
38Mb
NGS
1/01/2010 1/08/2010 1/01/2011
1100
860
300
5900
2600
1000
7000
3460
1300
exome capture Seq - 2.5Gbases total cost
 HiSeq specifications:
◦ 2 flow cells
◦ 16 lanes (8 per flow cell)
◦ 200-300 Gbases per flow cell
◦ 10 days for a single run
 Exome throughput
◦ 96 @ 60x coverage per run
◦ 3000 @ 60x coverage per year
Data formatting & QC
Mapping & QC
Variant calling
Variant annotation
Variant filtering/comparison
DATA STORAGEDATA GENERATION DATA PROCESSING
REPORTING
&
VALIDATION
RESULTS
INTERPRETATION
Prepare
sample library
Perfom exome
capture
Perform
sequencing
Prepare
sample library
Perfom exome
capture
Perform
sequencing
Prepare
sample library
Perfom exome
capture
Perform
sequencing
Sequence Data
10-15 Gb / exome
DATA STORAGEDATA GENERATION DATA PROCESSING
Image processing
Base calling
1
•Mapping
2
•Duplicate marking
3
•Local realignment
4
•Base quality recalibration
5
•Analysis-ready mapped reads
Sequence Data
10-15 Gb / exome
DATA STORAGEDATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
QC NGS
Mapping
QC HC
QC NGS
Mapping
QC HC
Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
DATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
DATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
0
200000
400000
600000
800000
1000000
1200000
INDEL
SNP
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
stopgain SNV
nonsynonymous SNV
nonframeshift insertion
nonframeshift deletion
non-coding
frameshift insertion
frameshift deletion
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
synonymous SNV
stoploss SNV
stopgain SNV
nonsynonymous SNV
nonframeshift insertion
nonframeshift deletion
frameshift insertion
frameshift deletion
0
50
100
150
200
250
300
350
400
450
500
stoploss SNV
stopgain SNV
nonframeshift insertion
nonframeshift deletion
frameshift insertion
frameshift deletion
Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
DATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
Database known
Variants Public &
Private
Variant Filtering
Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
DATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
Database known
Variants Public &
Private
Variant Filtering
REPORTING
&
VALIDATION
RESULTS
Validated
variants in
candidate genes
INTERPRETATION

2011 jeroen vanhoudt_ngs

Editor's Notes