Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.
DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.
8. 16S/ITS community surveys
• Multiple target regions in 16S gene and ITS region
• Comparison of results requires amplification of same region
• Advantages
– Fast survey of large communities
– Mature set of tools and statistics for analysis
– Good for first round survey
• 454 16S tags or pyrotags (~ 700 bp) have been the
preferred method
• Illumina Miseq (2x150bp, 2x250 bp) are the next
workhorses
• Depth of sampling
– 2-6000 reads/sample for simple communities
– 20000 reads /sample for complex soil metagenomes
9. 16S/ITS issues
• Lack of tools for processing ITS/Fungal microbiome data
sets
– RDP classifier targets only ITS
– No ITS reconstruction tools
• Amplification bias effects accuracy and replication
• Use of short reads prevents disambiguation of similar
strains
• 16S or ITS may not differentiate between similar strains
– Clustering is done at 97%
– Regions may be >99% similar
• Sequencing error inflates number of OTUs
• Chloroplast 16S sequences can get amplified in plant
metagenomes
10. 16S/ITS sequence processing workflow
Filter for
contaminants and
low quality reads
Assemble
overlapping reads
Reduce datasets
(clustering)
Perform taxonomic
classification and
compute diversity
metrics
16. • http://www.qiime.org/
• Comprehensive suite of tools
– OTU picking
– Taxonomic classification
– Construction of phylogenetic
trees
– Visualization
– Compute diversity statistics
• Available as Amazon EC2
image
17. Whole Genome Shotgun (WGS)
Metagenomics
• Better classification with Increasing number of
complete genomes
• Focus on whole genome based phylogeny (whole
genome phylotyping)
• Advantages
– No amplification bias like in 16S/ITS
• Issues
– Poor sampling of fungal diversity
– Assembly of metagenomes is complicated due to
uneven coverage
– Requires high depth of coverage
18. WGS sequence processing workflow
Filter for low
quality reads
Assemble
reads
Perform taxonomic
classification and
compute diversity
metrics
22. WGS sequence processing workflow
Filter for low
quality reads
Assemble
reads
Perform taxonomic
classification and
compute diversity
metrics
• Web based classifiers
– MG-RAST
http://metagenomics.anl.gov/
– CAMERA
http://camera.calit2.net/
– IMG/M
http://img.jgi.doe.gov/cgi-bin/m/main.cgi
23. MetaPhAln
• Unique clade-specific markers for sequenced bacteria and archaea
• 400 genuses/4000 genomes including HMP genomes
• Species level resolution
• MetaPhAln 2 in the works
– Eukaryotes including Fungi
– Viruses
– Higher coverage of archaea
• Krona and GraphAln for visualization
of output
• Websites
– https://bitbucket.org/nsegata/metaphlan
– http://huttenhower.sph.harvard.edu/metaphlan
24. PhyloSift/pplacer
• Reference database of marker genes
• Places reads on tree of life based on homology to
reference protein
• Integration with metAMOS for pre-assembling next-
generation datasets
• Bacterial and Archaeal classification only
• Plant and Fungi marker genes are being added
• Websites
– http://phylosift.wordpress.com/
– https://github.com/gjospin/PhyloSift
25. Real cost of Sequencing!!
Sboner, Genome Biology, 2011
27. Thank you!
Surya Saha ss2489@cornell.edu
Suggestions
• Plan informatics workflow as early as possible
• Incorporate statistics at different stages in the workflow