2011 jeroen vanhoudt_ngs

 Next-Generation sequencing (NGS)
technologies – overview
 NGS targeted re-sequencing – fishing out the
regions of interest
 NGS workflow: data collection and processing
– the exome sequencing pipeline

Next-Generation sequencing
(NGS) technologies – overview

 The automated
Sanger method is
considered as a ‘first-
generation’
technology, and
newer methods are
referred to as next-
generation
sequencing (NGS).

 1953 Discovery of DNA double helix structure
 1977
◦ A Maxam and W Gilbert "DNA seq by chemical degradation"
◦ F Sanger"DNA sequencing with chain-terminating inhibitors"
 1984 DNA sequence of the Epstein-Barr virus, 170 kb
 1987 Applied Biosystems - first automated sequencer
 1991 Sequencing of human genome in Venter's lab
 1996 P. Nyrén and M Ronaghi - pyrosequencing
 2001 A draft sequence of the human genome
 2003 human genome completed
 2004 454 Life Sciences markets first NGS machine

Random
genome
sequencing
• 25 Mb
• 300k reads
• 110bp
Sanger
sequencing
• Targeted
• 700-1000 bp

 The newer technologies constitute various
strategies that rely on a combination of
◦ Library/template preparation
◦ Sequencing and imaging

 Commercially available technologies
◦ Roche – 454
 GSFLX titanium
 Junior
◦ Illumina
 HiSeq2000
 MySeq
◦ Life – SOLiD
 5500xl
 Ion torrent
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS

 Produce a non-biased source of nucleic acid
material from the genome

 Produce a non-biased source of nucleic acid
material from the genome
 Current methods:
◦ randomly breaking genomic DNA into smaller sizes
◦ Ligate adaptors
◦ attach or immobilize the template to a solid surface
or support
◦ the spatially separated template sites allows
thousands to billions of sequencing reactions to be
performed simultaneously

 Clonal amplification
◦ Roche – 454
◦ Illumina – HiSeq
◦ Life – SOLiD
 Single molecule sequencing
◦ Helicos BioSciences – HeliScope
◦ Pacific Biosciences – PacBio RS

 In solution – emulsion PCR (emPCR)
◦ Roche – 454
◦ Life – SOLiD
 Solid phase – Bridge PCR
◦ Illumina – HiSeq

Picotitre plate Pyrosequencing

 The major advance offered by NGS is the
ability to cheaply produce an enormous
volume of data
 The arrival of NGS technologies in the
marketplace has changed the way we think
about scientific approaches in basic, applied
and clinical research

fishing out the regions of
interest

Random
genome
sequencing
??? ??? Sanger
sequencing
•Targeted
•700-1000
bp

 Library/template preparation
 Library enrichment for target
 Sequencing and imaging

Random
genome
sequencing
Hybrid
Capture
PCR based Sanger
sequencing

In solution
•Agilent
•Nimblegen
•...
Solid phase
•Agilent
•Nimblegen
•Febit
•...

In solution
• Relatively cheap
• High throughput is
possible
• Small amounts of DNA
sufficient
Solid phase
• Straightforward method
• Flexible
• Higher amounts of DNA

•Uniplex
•Multiplex
•Fluidigm
•Raindance
•Multiplicon
•Longrange PCR products
•Raindance

data collection and processing
– the exome sequencing
pipeline

 The human genome
◦ Genome = 3Gb
◦ Exome = 30Mb
◦ 180 000 exons
 Protein coding genes
◦ constitute only approximately 1% of the human
genome
◦ It is estimated that 85% of the mutations with large
effects on disease-related traits can be found in
exons or splice sites

1/01/2010 1/08/2010 1/01/2011
1100
860
300
5900
2600
1000
7000
3460
1300
exome capture Seq - 2.5Gbases total cost

 HiSeq specifications:
◦ 2 flow cells
◦ 16 lanes (8 per flow cell)
◦ 200-300 Gbases per flow cell
◦ 10 days for a single run
 Exome throughput
◦ 96 @ 60x coverage per run
◦ 3000 @ 60x coverage per year

Data formatting & QC
Mapping & QC
Variant calling
Variant annotation
Variant filtering/comparison

DATA STORAGEDATA GENERATION DATA PROCESSING
REPORTING
&
VALIDATION
RESULTS
INTERPRETATION

Prepare
sample library
Perfom exome
capture
Perform
sequencing

Sequence Data
10-15 Gb / exome
Image processing
Base calling

1
•Mapping
2
•Duplicate marking
3
•Local realignment
4
•Base quality recalibration
5
•Analysis-ready mapped reads

Sequence Data
10-15 Gb / exome
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp

Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
DATA GENERATION DATA PROCESSING
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation

Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation

0
200000
400000
600000
800000
1000000
1200000
INDEL
SNP

0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
stopgain SNV
nonsynonymous SNV
nonframeshift insertion
nonframeshift deletion
non-coding
frameshift insertion
frameshift deletion

0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
synonymous SNV
stoploss SNV
stopgain SNV
nonsynonymous SNV
frameshift deletion

0
50
100
150
200
250
300
350
400
450
500
stoploss SNV
stopgain SNV
frameshift deletion

Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
Database known
Variants Public &
Private
Variant Filtering

Sequence Data
10-15 Gb / exome
DATA STORAGE
Mapping results
5 Gb / exome
Variant Calls
100Mb / exome
Image processing
Base calling
QC sequencing
Mapping sequences
QC capture exp
Variant Calling
Variant Annotation
Database known
Variants Public &
Private
Variant Filtering
REPORTING
&
VALIDATION
RESULTS
Validated
variants in
candidate genes
INTERPRETATION

2011 jeroen vanhoudt_ngs

More Related Content

What's hot

Viewers also liked

Similar to 2011 jeroen vanhoudt_ngs

Recently uploaded

2011 jeroen vanhoudt_ngs

Editor's Notes