Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ngs microbiome
1. Next-Generation Sequencing of Microbial
Genomes and Metagenomes
Christine King
Farncombe Metagenomics Facility
Human Microbiome Journal Club
July 13, 2012
7. Which instrument(s) to use?
Read length vs number of reads
Cost per base, per sample, per project (multiplexing?)
Accuracy
Run time, wait time
Application Lengt # Accura Instruments Considerations
h Reads cy
De novo +++ ++ ++ MiSeq, 454, Ion Mix lengths
(small)
De novo +++ +++ ++ HiSeq, 454, Mix lengths, MP
(large) SOLiD
Re-seq ++ ++ ++ MiSeq, Ion Multiplex?
(small)
Re-seq (large) ++ +++ ++ HiSeq, SOLiD Enrichment?
RNA-seq + +++ + Illumina, SOLiD, Ref? Size?
(count) Ion Rare?
8. Library Preparation
Goal: fragments of DNA, each end flanked by adaptor
sequences
Adaptors contain amplification- and sequencing primer
binding sites; platform- and chemistry-specific
Optional: sample-specific barcodes/indexes/MIDs/tags
allow multiplexing during sequencing
Library QC: quantity, size
9. Library Preparation
Library types:
Shotgun (DNA)
May begin with ChIP
May follow with sequence capture
Mate pair (DNA)
Amplicon (DNA)
Total RNA
May enrich for mRNA (poly-A enrichment, rRNA depletion)
Convert to cDNA (then similar to DNA protocols)
Small RNA
RNA ligations, convert to cDNA after
13. Library Preparation: Mate Pair
Begin with large
fragments (e.g. 3kb,
20kb)
Circularize and
fragment again
Illumina: direct ligation
454: Cre/Lox
recombination
Enrich for fragments
containing the junction
Proceed with shotgun
library prep
14. Library Preparation: Mate Pair
Why? Paired
sequences are a known
distance apart;
improves genome
assembly
Note: 454 calls these
“paired end libraries”,
not to be confused with
Illumina’s “paired end
sequencing”!
15. Sequencing: Illumina
Cluster generation
Library fragments hybridize
to oligos on the flow cell
New strand synthesized,
original denatured,
removed
Free end binds to adjacent
oligos (bridge formation)
Complimentary strand
synthesized, denatured
(both tethered to flow cell)
Repeat to form clonal
cluster
Cleave one oligo, denature
to leave ssDNA clusters
~800K clusters/mm^2
16. Sequencing: Illumina
Variety of workflows:
Single- or paired end reads
0, 1, or 2 index reads
17. Sequencing: Illumina
At each cycle, all 4 fluorescently-labeled
nucleotides pass over the flow cell
Each cluster incorporates one nt (terminator) per
cycle
Fluor is imaged, then cleaved
De-block and repeat
18. Sequencing: Illumina
Other terminology:
cBot – accessory instrument that performs cluster
generation
Lanes – divisions (8) of HiSeq and GAIIx flow cells
PhiX – bacteriophage with small, balanced genome; PhiX
library spiked in with samples for QC
Phasing/pre-phasing – nt incorporation falls behind or
jumps ahead on a portion of strands in the cluster and
contributes to noise
Chastity filter – measures signal purity (after intensity
corrections); if the background signal is high, cluster will be
discarded
BaseSpace – cloud computing site for processing MiSeq
data
File format: fastq
19. Sequencing: 454
emPCR: clonal
amplification of
bead-bound library
in microdroplets
Library input
amounts critical!
One molecule per
bead
Titration procedure
20. Sequencing: 454
Library capture:
beads coated with
complimentary oligo
Amplification:
droplet contains
PCR reagents and
the other oligo
Post-PCR: millions
of identical
fragments attached
to the bead
21. Sequencing: 454
Bead Recovery: Enrichment: capture
physical and successfully
chemical disruption amplified beads
using biotinylated
primers + magnetic,
streptavidin beads
24. Sequencing: 454
Pyrosequencing
4 nucleotides flow
separately
If nt
incorporation…PPi...light
APS + PPi (sulfurylase)
ATP
Luciferin + ATP (luciferase)
light + oxyluciferin
Amount of light
proportional to #nt
incorporated
Rinse and repeat with next
nt
25. Sequencing: 454
Camera captures light
emitted from every well
during every nucleotide flow
26. Sequencing: 454
Flowgram: representation of a sequence, based on the
pattern of light emitted from a single well
27. Sequencing: 454
Other terminology:
Lib-L/Lib-A: adapter variants, “ligated” or “annealed”
Titanium chemistry: ~450 bp reads on all instruments
XL+ chemistry: ~700 bp reads on the FLX+ instrument
Flow: one of the four nucleotides flows over the PTP
Cycle: a set of four flows, in order
Valley flow: if number of bases incorporated in a given
read during that flow is uncertain, e.g. 1.5 units of light
(background signal, homopolymers)
File format: sff (standard flowgram format)
28. Sequencing: Ion Torrent
Procedures and
chemistry similar to 454
Instead of PPi, measure
H+ release (pH change)
via semiconductor chip
No expensive camera or
laser required, no
modified nucleotides
29. Sequence Quality
Phred (Q) Probabilit Base Call
Error probabilities
Score y of Error Accuracy determined using
(P)
training sets,
10 1 in 10 90%
platform-specific
20 1 in 100 99%
30 1 in 1K 99.9%
biases
40 1 in 10K 99.99% Expressed as a
50 1 in 100K 99.999% quality value (QV or Q
score) per base
Similar to PHRED
scores:
Q = -10 log10P
P = 10 -Q/10
30. Project 1: Microbial Genome
Considerations: Coverage
Reference genome? Depth (number of
How much coverage times a particular
do I want? base is “covered” by a
read (e.g. 25X)
How big is the
genome Breadth (% of genome
with at least 1X
How much data do I
coverage)
need?
bp needed = genome
size X coverage
Which
instrument/chemistry
configuration to use?
31. Project 1: Microbial Genome
Sample preparation
Isolate high quality (not
degraded) and high purity (no
RNA) gDNA
Verify on a gel
Quantify using dsDNA-specific
dye
Library preparation
Can do this yourself if you like
~ $200 per sample for Nextera
Cheaper protocols
Cheaper in bulk
Barcode compatibility
32. Project 1: Microbial Genome
Library QC
Insertsize confirmed on BioAnalyzer (within
range, no artifacts)
Pool barcoded libraries (normalize based on
PicoGreen quantification)
Absolute quantification of library pools using
qPCR
33. Project 1: Microbial Genome
MiSeq sequencing
Diluteand denature library pool (optimal
concentration requires titration...)
Spike in PhiX library as needed (e.g. 1%)
Prepare and load reagents, flow cell
Basic filtering and de-multiplexing performed
automatically
Download fastq files from BaseSpace
34. Project 1: Microbial Genome
Data processing Assembly:
Additional filtering overlapping reads
Trim the ends are assembled to
Remove PCR eachother based on
duplicates sequence similarity
= contigs
36. Project 2: Microbial Community
Shotgun Targeted
metagenomics metagenomics
Unbiased survey of Limited survey of
community content community content
Random library Targeted loci provide
fragments may excellent taxonomic
provide very little resolution, but may
taxonomic resolution exclude certain taxa
(e.g. conserved,
unknown)
Identify OTUs, classify
Identify genes, by taxonomy
classify by function
37. Project 2: Microbial Community
16S rRNA
Multi-copy gene (1.5
kb)
Conserved and
hypervariable regions
Extensive databases
from known species
38. Project 2: Microbial Community
Considerations: Sample preparation:
Biases in sampling Isolate
DNA
methods, culturing, PCR amplify, purify
DNA isolation, High-fidelity
PCR...replicate polymerase
Available SOPs Barcoded primers
How many reads per No primer dimers!
sample? NormalizePCR
Read length products and pool
matters!
39. Project 2: Microbial Community
454 Sequencing Data processing
emPCR titrations De-multiplexing
with different library Additionalfiltering
input Trim the barcodes,
Bulk emPCR primers
Sequence Check for chimeras
Basic filtering
Collect sff files
40. Project 2: Microbial Community
Clustering
Sequences grouped
by similarity = OTUs
41. Project 2: Microbial Community
Taxonomic
identification
OTUs are classifed by
comparing to known
16S sequences
Level of classification
(e.g. family vs
genus)?
Diversity
Within sample
Between samples