Ngs microbiome

Next-Generation Sequencing of Microbial
Genomes and Metagenomes

Christine King
Farncombe Metagenomics Facility

Human Microbiome Journal Club
July 13, 2012

Overview
 Next-generation sequencing
 Applications

 Instruments

 Library
prep and sequencing chemistry
 Sequence quality

 Project overview
 Microbial genomes
 Microbial communities

DNA Sequencing
 1st generation
 Sanger chain
termination
 Capillary
electrophoresis
 2nd generation (NGS)
 High throughput,
“massively parallel”
 Shorter reads
 Sequencing-by-
synthesis
 3rd generation
 Single molecule

Applications
 DNA sequencing
 De novo genomes
 Resequencing
 Shotgun (e.g. mutant
strains)
 Amplicon (e.g. HLA,
cancer)
 Sequence capture (e.g.
exome)
 Metagenome
 Amplicon (e.g. 16S, COI,
viral)
 Shotgun
 ChIP
 RNA sequencing
 Gene expression
 Gene annotation, splice
variants

Instruments
Total
# of Read Cost
outp Run
Instrument read length per Technology
ut Time
s (bp) base
(Gb)
GS FLX 1M 450 0.5 $$$$ ++

GS FLX+ 1M 650 0.6 $$$$ ++ emPCR, SBS, light detection

GS Jr 100K 450 0.05 $$$$ ++

GAIIx 640M 2x 150 90 $$ +++

HiSeq 2000 6B 2x 100 600 $ +++ Bridge PCR, SBS, fluororphore

MiSeq 12M 2x 150 2 $$ ++

PacBio RS >10K >1000 0.01 $$$$ + Single-molecule seq, fluorophore

SOLiD 5500xl 1.4B 75 + 35 155 $ +++ emPCR, probe ligation, fluorophore
Ion PGM -
1M >100 0.1 $$$ +
316
emPCR, SBS, pH change
Ion PGM -
6M >100 1 $$ +
318

Which instrument(s) to use?
 Read length vs number of reads
 Cost per base, per sample, per project (multiplexing?)
 Accuracy
 Run time, wait time
Application Lengt # Accura Instruments Considerations
h Reads cy
De novo +++ ++ ++ MiSeq, 454, Ion Mix lengths
(small)
De novo +++ +++ ++ HiSeq, 454, Mix lengths, MP
(large) SOLiD
Re-seq ++ ++ ++ MiSeq, Ion Multiplex?
(small)
Re-seq (large) ++ +++ ++ HiSeq, SOLiD Enrichment?
RNA-seq + +++ + Illumina, SOLiD, Ref? Size?
(count) Ion Rare?

Library Preparation
 Goal: fragments of DNA, each end flanked by adaptor
sequences

 Adaptors contain amplification- and sequencing primer
binding sites; platform- and chemistry-specific

 Optional: sample-specific barcodes/indexes/MIDs/tags
allow multiplexing during sequencing

 Library QC: quantity, size

Library Preparation
 Library types:
 Shotgun (DNA)
 May begin with ChIP
 May follow with sequence capture
 Mate pair (DNA)
 Amplicon (DNA)
 Total RNA
 May enrich for mRNA (poly-A enrichment, rRNA depletion)
 Convert to cDNA (then similar to DNA protocols)
 Small RNA
 RNA ligations, convert to cDNA after

Library Preparation: Shotgun
 Fragmentation
 Sonication
 Nebulization
 Enzymatic

 End repair
 3’ overhangs digested
 5’ overhangs filled
 5’ phosphate added

Library Preparation: Shotgun
 Adapter ligation
 T-overhangs
 Forked structure controls
orientation

 Library amplification
 Few cycles
 Enrich for correctly-adapted
fragments
 Required to complete
adapter structure in some
protocols

 Size selection
 Gel excision, AMPure beads
 Limit insert size as needed,
remove artifacts

Library Preparation: Amplicon
 Amplify region of  Primers contain
interest using PCR adapter sequences

Library Preparation: Mate Pair
 Begin with large
fragments (e.g. 3kb,
20kb)

 Circularize and
fragment again
 Illumina: direct ligation
 454: Cre/Lox
recombination

 Enrich for fragments
containing the junction

 Proceed with shotgun
library prep

Library Preparation: Mate Pair
 Why? Paired
sequences are a known
distance apart;
improves genome
assembly

 Note: 454 calls these
“paired end libraries”,
not to be confused with
Illumina’s “paired end
sequencing”!

Sequencing: Illumina
 Cluster generation
 Library fragments hybridize
to oligos on the flow cell
 New strand synthesized,
original denatured,
removed
 Free end binds to adjacent
oligos (bridge formation)
 Complimentary strand
synthesized, denatured
(both tethered to flow cell)
 Repeat to form clonal
cluster
 Cleave one oligo, denature
to leave ssDNA clusters
 ~800K clusters/mm^2

 Variety of workflows:
 Single- or paired end reads
 0, 1, or 2 index reads

 At each cycle, all 4 fluorescently-labeled
nucleotides pass over the flow cell
 Each cluster incorporates one nt (terminator) per
cycle
 Fluor is imaged, then cleaved
 De-block and repeat

 Other terminology:
 cBot – accessory instrument that performs cluster
generation
 Lanes – divisions (8) of HiSeq and GAIIx flow cells
 PhiX – bacteriophage with small, balanced genome; PhiX
library spiked in with samples for QC
 Phasing/pre-phasing – nt incorporation falls behind or
jumps ahead on a portion of strands in the cluster and
contributes to noise
 Chastity filter – measures signal purity (after intensity
corrections); if the background signal is high, cluster will be
discarded
 BaseSpace – cloud computing site for processing MiSeq
data

 File format: fastq

Sequencing: 454
 emPCR: clonal
amplification of
bead-bound library
in microdroplets

 Library input
amounts critical!
 One molecule per
bead
 Titration procedure

Sequencing: 454
 Library capture:
beads coated with
complimentary oligo
 Amplification:
droplet contains
PCR reagents and
the other oligo
 Post-PCR: millions
of identical
fragments attached
to the bead

Sequencing: 454
 Bead Recovery:  Enrichment: capture
physical and successfully
chemical disruption amplified beads
using biotinylated
primers + magnetic,
streptavidin beads

Sequencing: 454
 Deposit bead layers
onto PicoTiterPlate:
 Enzyme beads
 Enriched DNA
beads
 More enzyme beads

 PPiase beads

Sequencing: 454
 Pyrosequencing

 4 nucleotides flow
separately
 If nt
incorporation…PPi...light
 APS + PPi (sulfurylase)
ATP
 Luciferin + ATP (luciferase)
light + oxyluciferin
 Amount of light
proportional to #nt
incorporated
 Rinse and repeat with next
nt

Sequencing: 454
 Camera captures light
emitted from every well
during every nucleotide flow

Sequencing: 454
 Flowgram: representation of a sequence, based on the
pattern of light emitted from a single well

Sequencing: 454
 Other terminology:
 Lib-L/Lib-A: adapter variants, “ligated” or “annealed”
 Titanium chemistry: ~450 bp reads on all instruments
 XL+ chemistry: ~700 bp reads on the FLX+ instrument
 Flow: one of the four nucleotides flows over the PTP
 Cycle: a set of four flows, in order
 Valley flow: if number of bases incorporated in a given
read during that flow is uncertain, e.g. 1.5 units of light
(background signal, homopolymers)

 File format: sff (standard flowgram format)

Sequencing: Ion Torrent
 Procedures and
chemistry similar to 454
 Instead of PPi, measure
H+ release (pH change)
via semiconductor chip
 No expensive camera or
laser required, no
modified nucleotides

Sequence Quality

Phred (Q) Probabilit Base Call
 Error probabilities
Score y of Error Accuracy determined using
(P)
training sets,
10 1 in 10 90%
platform-specific
20 1 in 100 99%
30 1 in 1K 99.9%
biases
40 1 in 10K 99.99%  Expressed as a
50 1 in 100K 99.999% quality value (QV or Q
score) per base
 Similar to PHRED
scores:
 Q = -10 log10P
 P = 10 -Q/10

Project 1: Microbial Genome
 Considerations:  Coverage
 Reference genome?  Depth (number of
 How much coverage times a particular
do I want? base is “covered” by a
read (e.g. 25X)
 How big is the
genome  Breadth (% of genome
with at least 1X
 How much data do I
coverage)
need?
 bp needed = genome
size X coverage
 Which
instrument/chemistry
configuration to use?

 Sample preparation
 Isolate high quality (not
degraded) and high purity (no
RNA) gDNA
 Verify on a gel
 Quantify using dsDNA-specific
dye

 Library preparation
 Can do this yourself if you like
 ~ $200 per sample for Nextera
 Cheaper protocols
 Cheaper in bulk
 Barcode compatibility

 Library QC
 Insertsize confirmed on BioAnalyzer (within
range, no artifacts)
 Pool barcoded libraries (normalize based on
PicoGreen quantification)
 Absolute quantification of library pools using
qPCR

 MiSeq sequencing
 Diluteand denature library pool (optimal
concentration requires titration...)
 Spike in PhiX library as needed (e.g. 1%)

 Prepare and load reagents, flow cell

 Basic filtering and de-multiplexing performed
automatically
 Download fastq files from BaseSpace

 Data processing  Assembly:
 Additional filtering overlapping reads
 Trim the ends are assembled to
 Remove PCR eachother based on
duplicates sequence similarity
= contigs

 What’s next?
 Polish the genome
(hybrid assemblies,
mate pair libraries)
 Annotate (ORFs,
RNA-seq)
 Compare

Project 2: Microbial Community
 Shotgun  Targeted
metagenomics metagenomics
 Unbiased survey of  Limited survey of
community content community content
 Random library  Targeted loci provide
fragments may excellent taxonomic
provide very little resolution, but may
taxonomic resolution exclude certain taxa
(e.g. conserved,
unknown)
 Identify OTUs, classify
 Identify genes, by taxonomy
classify by function

 16S rRNA
 Multi-copy gene (1.5
kb)
 Conserved and
hypervariable regions
 Extensive databases
from known species

 Considerations:  Sample preparation:
 Biases in sampling  Isolate
DNA
methods, culturing,  PCR amplify, purify
DNA isolation,  High-fidelity
PCR...replicate polymerase
 Available SOPs  Barcoded primers

 How many reads per  No primer dimers!

sample?  NormalizePCR
 Read length products and pool
matters!

 454 Sequencing  Data processing
 emPCR titrations  De-multiplexing
with different library  Additionalfiltering
input  Trim the barcodes,
 Bulk emPCR primers
 Sequence  Check for chimeras
 Basic filtering

 Collect sff files

 Clustering
 Sequences grouped
by similarity = OTUs

 Taxonomic
identification
 OTUs are classifed by
comparing to known
16S sequences
 Level of classification
(e.g. family vs
genus)?

 Diversity
 Within sample
 Between samples

Ngs microbiome

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ngs microbiome

Similar to Ngs microbiome (20)

Recently uploaded

Recently uploaded (20)

Ngs microbiome