3. Biocomputing Research Consulting and
Scientific Software Development
High
Throughput
Illustration
Animation
http://www.niaid.nih.gov/about/organization/odoffices/omo/ocicb/Pages/bcbb.aspx
ScienceApps@niaid.nih.gov
3
5. The Microbiome
“The ecological community of
commensal, symbiotic, and
pathogenic microorganisms that
literally share our body space”
(Lederberg and McCray 2001)
5
9. Human Microbiome
§ The human body contains approximately 10x as many
microbes as human cells, including bacteria, archaea,
fungi, and viruses (about 1014 vs 1013).
§ “Metagenome” or our “other genome”
• Includes all genes from bacteria, etc.
• About 10,000 microbial species
§ First introduction occurs at birth
§ Microbes provide enzymes for digestion and other
compounds such as vitamins
9
Berg, Trends in Microbiology, 1996
http://www.nih.gov/news/health/jun2012/nhgri-13.htm
Huse et al., PLoS ONE, 2012
10. Human Microbiome Project
§ Funded by NIH Common Fund,
FY2007-2015
§ “Develop tools and datasets for the
research community for studying
the role of these microbes in human
health and disease.”
§ Phase I (2007-2012)
• Composition and Diversity of
microbial communities
• Sequencing 3000 reference
genomes
§ Phase II (2013-2015)
• Integrated analysis of host and
microbiome in human health and
disease
§ Primary focus on bacterial
microbiome
§ “Your mouth is connected to your
rectum.” J
• PatSchloss
10http://commonfund.nih.gov/hmp/index
11. Microbiome Analysis
§ Identifying microbial populations in various body tissues
and how changes in these populations correlate to various
disease states
§ Techniques
• Whole genome shotgun (WGS) sequencing
– Sampling all genes of all organisms in a sample
– Goal is to determine functional groups of genes
• 16S rRNA metagenomic sequencing
– Targeted amplicon sequencing of all 16S rRNA genes in a
population of microbes
– Goal is to determine taxonomic distribution of microbial species
• Microbial metatranscriptomics
– RNA-seq of all organisms in a population
11
12. Carriage of microbial taxa varies while metabolic
pathways remain stable within a healthy population
12C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234
WGS vs. 16S
13. 16S rRNA Variable Regions
Slide modified from J. Wan
V13
V35
V69
§ Part of the 30S subunit of the
prokaryotic ribosome
§ Widely conserved (bacteria, archaea)
§ 9 hypervariable regions, flanked by
conserved sequences
14. Illumina: Advantages and Challenges
454 Illumina MiSeq Illumina HiSeq
Reads /
run
1 million 25 million
300 million – 2
billion
Max
Read
length
400 – 700 bp 2 x 300bp (paired) 2 x 150bp (paired)
Error rate moderate low low
Cost / Mb $7 – $22 ~$0.50 $0.04 – $0.074
• Long reads offer greater taxonomic information (454, MiSeq)
• Low error rates produce more accurate data (MiSeq, HiSeq)
• Current tools aren’t designed to cluster and classify non-overlapping
paired ends (MiSeq, HiSeq)
• Higher sequencing depth offers greater sensitivity for detection
http://www.illumina.com/systems/sequencing.ilmn
http://nextgenseek.com/2012/08/comparing-price-and-tech-specs-of-illumina-miseq-ion-torrent-pgm-454-gs-junior-and-pacbio-rs/
Slide modified from J. Wan
14
15. 16S rRNA Sequencing
Sample
DNA
Extraction
Genomic DNA
PCR
Amplification
16S Amplicons
Next-Gen
Sequencing
Sequence Data
TGGGGAATATTGGACAATGGGGGG
AACCCTGATCCAGCCATGCCGCGT
GTGTGAAGAAGGCCTTATGGTTGT
AATGGGGAATATTGCACAATGGGC
GAAAGCCTGATGCAGCGACGCCGC
GTGAGGGATGGAGGCCTTCGGGTT
GTAAATAATGGGGAATATTGCACA
ATGGGCGAAAGCCTGATGCAGCGA
Slide modified from J. Wan 15
16. How are 16S sequence data analyzed?
§ Usually interested in taxa, not genotypes
§ Sequences can be grouped into taxa by:
• Traditional taxonomic classification (phylotypes)
• Phylogenetic tree
• Operational taxonomic units (OTU)
§ Operational taxonomic units (OTUs) are used to
represent groups of related organisms
§ OTUs at 3% sequence difference are used as a
proxy for species-level diversity
Slide modified from J. Wan 16
17. Caution!
Contamination
Polymerase error
Primer mismatch
Amplification bias
Chimera formation
Sequencing error
Sample
DNA
Extraction
Genomic DNA
PCR
Amplification
16S Amplicons
Next-Gen
Sequencing
Sequence Data
TGGGGAATATTGGACAATGGGGGG
AACCCTGATCCAGCCATGCCGCGT
GTGTGAAGAAGGCCTTATGGTTGT
AATGGGGAATATTGCACAATGGGC
GAAAGCCTGATGCAGCGACGCCGC
GTGAGGGATGGAGGCCTTCGGGTT
GTAAATAATGGGGAATATTGCACA
ATGGGCGAAAGCCTGATGCAGCGA
Slide modified from J. Wan 17
18. Caution!
§ At such high read numbers, errors are inevitable
§ When not accounted for, errors greatly inflate OTU
counts and diversity estimates
• Hundreds of “species-level” OTUs identified in
30,000 E. coli reads (Huse, Environ Microbiol. 2010)
§ Solutions:
• Single-linkage pre-cluster step (SLP)
• Alternatively, model sequencing errors and use
machine learning to remove noise (e.g., DADA)
Slide modified from J. Wan 18
19. Software/Databases for Microbiome Analysis
§ Mothur (mothur.org) - full 16S analysis suite
§ QIIME (qiime.org) - full 16S analysis suite
§ MG-RAST server (metagenomics.anl.gov) - 16S and WGS
§ CloVR (clovr.org) - 16S and WGS
§ BioBakery (bitbucket.org/biobakery/biobakery)
§ BROAD Microbiome (microbiomeutil.sourceforge.net) - chimera detection,
OTU binning
§ Ribosomal Database Project (RDP; rdp.cme.msu.edu) - 16S and 28S
Fungal
• RDP Classifier (rdp-classifier.sourceforge.net/)
§ greengenes (greengenes.lbl.gov) - Taxonomy, 16S
§ IMG (img.jgi.doe.gov/imgm_hmp) - DOE Joint Genome Institutes; genome
annotation
§ PATRIC (patricbrc.org) - Pathogens
§ SILVA (arb-silva.de) - 16S, 18S, 28S
§ More tools listed @ HMP DACC: http://www.hmpdacc.org/tools_protocols/
tools_protocols.php
19
20. Nephele:
Microbiome
Analysis
in
the
Cloud
Microbiome
analysis
+
Cloud
compu9ng
=
no
hassle
for
installa9on
and
“on
demand”
analysis
pla?orm
service
22. Focus
Group
for
Usability
Tes9ng
Nephele
is
currently
under
development.
We
need
your
feedback
to
improve
features
and
usability
from
a
users’
perspec7ve,
i.e.,
YOU!
Analysis
Engine
Data
Explorer
Please
signup
and
gain
early
access
to
Nephele
(for
tes7ng
purposes)!
nephele@mail.nih.gov
23. Mothur
§ “This project seeks to develop a single piece of open-
source, expandable software to fill the bioinformatics
needs of the microbial ecology community.”
§ Documentation:
• http://www.mothur.org/wiki/Mothur_manual
§ Support:
• http://www.mothur.org/forum/
§ Tutorials / Protocols
• http://www.mothur.org/wiki/Analysis_examples
• http://www.mothur.org/wiki/454_SOP
• http://www.mothur.org/wiki/MiSeq_SOP
23
25. Basic Workflow for 16S Analysis
§ 1. Remove unwanted reads and sequencing and PCR
error
quality filtering
pre.cluster/SLP
§ 2. Identify and remove chimeric sequences
UCHIME
§ 3. Cluster operational taxonomic units (OTUs)
average linkage (UPGMA), complete linkage
§ 4. Classify OTUs
naïve Bayesian classification (Wang), BLAST
§ 5. Diversity Analysis and plots
Alpha Diversity, Beta Diversity
26. Set up Environment
§ Open Terminal
§ cd [drag MiSeq_SOP folder into terminal] [Enter]
§ ls -al
§ export PATH=$PATH:/path/to/Desktop/
mothurGUI/mothur (drag folder into terminal)
§ which mothur
§ mothur
§ quit()
26
27. Experimental Design
§ Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD.
(2013): Development of a dual-index sequencing strategy and
curation pipeline for analyzing amplicon sequence data on the
MiSeq Illumina sequencing platform. Applied and Environmental
Microbiology. 79(17):5112-20.
§ Total 362 samples
§ Test dataset: 21 samples
• Female 3 days 0-9 and days 141-150 (post-weaning)
• Mock
§ R1 vs. R2 (read1, read2 -- not replicates)
§ Timecourse, Early vs. Late
§ V4 region
§ Already demultiplexed by Illumina MiSeq software (one sample
per file)
27
28. Mothur-formatted 16S Sequence Databases
§ SILVA
• Aligned Fasta, width 50,000 bases
• Used for alignment to make sure reads are in the correct
region
§ Gold (BROAD)
• Used for Chimera detection with chimera.slayer
§ RDP Classifier training
• Unaligned Fasta
• Use with accompanying .taxonomy file for classify.seqs
• Has mitochondria, chloroplast so you can use for filtering out
junk
§ Greengenes
• Unaligned Fasta
• Use with accompanying .taxonomy file for classify.seqs
• Use for actual classification of sequences and OTUs
28
29. Tutorials, other tools for today
§ MiSeq initial steps:
• http://www.mothur.org/wiki/MiSeq_SOP
§ Analysis
• http://www.mothur.org/wiki/454_SOP
§ Plot phylogenetic tree
• http://iubio.bio.indiana.edu/treeapp/treeprint-
form.html
§ Examples of plots
• http://qiime.org/tutorials/tutorial.html
29
30. 30
Thank You
For questions or comments please contact:
andrew.oler@nih.gov
ScienceApps@niaid.nih.gov
Slides available here
(open in Safari or Internet Explorer):
http://collab.niaid.nih.gov/sites/research/SIG/Bioinformatics/
-> Next Gen Sequencing -> “16S Microbiome Analysis”