Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

34,789 views

Published on

Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool.

DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.

Published in: Education, Technology

Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

  1. 1. Computational Tools forMetagenomicsSurya SahaTwitter: @SahaSurya / LinkedIn: www.linkedin.com/in/suryasaha/Magdalen LindebergPlant Pathology & Plant-Microbe BiologyMicrobial Friends & Foes, Sep 25, 2012
  2. 2. Temperton, Current Opinion in Microbiology, 2012Impact of Technology on Metagenomics
  3. 3. Types of “Meta” genomics16S rRNA survey of bacterialmicrobiomeITS survey of fungalmicrobiomeBellemain, BMC Microbiology 2010Slide: Julien Tremblay, JGI
  4. 4. Types of “Meta” genomicsWhole genome shotgun• Varying complexity of microbial communities• High coverage sequencing• Sophisticated informatics• Host associated metagenomes– Deep sequencing of host meta-genome– Bioinformatic screening of host sequences• Environmental metagenomes– Eg. Soil samples– Requires very high depth of coverage– Complicated to assemble
  5. 5. Big picture!!
  6. 6. Big picture!!What users see
  7. 7. Big picture!!What users seeWhat users want!!
  8. 8. 16S/ITS community surveys• Multiple target regions in 16S gene and ITS region• Comparison of results requires amplification of same region• Advantages– Fast survey of large communities– Mature set of tools and statistics for analysis– Good for first round survey• 454 16S tags or pyrotags (~ 700 bp) have been thepreferred method• Illumina Miseq (2x150bp, 2x250 bp) are the nextworkhorses• Depth of sampling– 2-6000 reads/sample for simple communities– 20000 reads /sample for complex soil metagenomes
  9. 9. 16S/ITS issues• Lack of tools for processing ITS/Fungal microbiome datasets– RDP classifier targets only ITS– No ITS reconstruction tools• Amplification bias effects accuracy and replication• Use of short reads prevents disambiguation of similarstrains• 16S or ITS may not differentiate between similar strains– Clustering is done at 97%– Regions may be >99% similar• Sequencing error inflates number of OTUs• Chloroplast 16S sequences can get amplified in plantmetagenomes
  10. 10. 16S/ITS sequence processing workflowFilter forcontaminants andlow quality readsAssembleoverlapping readsReduce datasets(clustering)Perform taxonomicclassification andcompute diversitymetrics
  11. 11. 16S/ITS sequence processing workflowFilter forcontaminants andlow quality readsAssembleoverlapping readsReduce datasets(clustering)Perform taxonomicclassification andcompute diversitymetrics• Quality plots and read trimming– FastQChttp://www.bioinformatics.babraham.ac.uk/projects/fastqc/– FASTXhttp://hannonlab.cshl.edu/fastx_toolkit/• Chimera removal– AmpliconNoisehttp://code.google.com/p/ampliconnoise/– UCHIMEhttp://www.drive5.com/uchime/
  12. 12. Impact of Sequence LengthSlide: Feng Chen, JGI
  13. 13. 16S/ITS sequence processing workflowFilter forcontaminants andlow quality readsAssembleoverlapping readsReduce datasets(clustering)Perform taxonomicclassification andcompute diversitymetrics• Merge overlapping paired end reads– FLASHhttp://www.genomics.jhu.edu/software/FLASH/index.shtml– FastqJoinhttp://code.google.com/p/ea-utils/wiki/FastqJoin– CD-HIT read-linkerhttp://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit-auxtools-manual
  14. 14. 16S/ITS sequence processing workflowFilter forcontaminants andlow quality readsAssembleoverlapping readsReduce datasets(clustering)Perform taxonomicclassification andcompute diversitymetrics• Clustering with high stringency– UCLUST/USEARCH (16S only)http://www.drive5.com/usearch/– CD-HIT-OTU (16S only)http://weizhong-lab.ucsd.edu/cd-hit-otu/– phylOTU (16S only)https://github.com/sharpton/PhylOTU
  15. 15. 16S/ITS sequence processing workflowFilter forcontaminants andlow quality readsAssembleoverlapping readsReduce datasets(clustering)Performtaxonomicclassification andcompute diversitymetrics• Composition based classifiers– RDP database + classifierhttp://rdp.cme.msu.edu/classifier/classifier.jsp• Homology based classifiers– ARB + Silva database (16S only)http://www.arb-home.de/– GreenGenes database (16S only)http://greengenes.lbl.gov/cgi-bin/nph-index.cgi– UNITE database (ITS only)http://unite.ut.ee/– FungalITSPipeline (ITS only)http://www.emerencia.org/fungalitspipeline.html
  16. 16. • http://www.qiime.org/• Comprehensive suite of tools– OTU picking– Taxonomic classification– Construction of phylogenetictrees– Visualization– Compute diversity statistics• Available as Amazon EC2image
  17. 17. Whole Genome Shotgun (WGS)Metagenomics• Better classification with Increasing number ofcomplete genomes• Focus on whole genome based phylogeny (wholegenome phylotyping)• Advantages– No amplification bias like in 16S/ITS• Issues– Poor sampling of fungal diversity– Assembly of metagenomes is complicated due touneven coverage– Requires high depth of coverage
  18. 18. WGS sequence processing workflowFilter for lowquality readsAssemblereadsPerform taxonomicclassification andcompute diversitymetrics
  19. 19. WGS sequence processing workflowFilter for lowquality readsAssemblereadsPerform taxonomicclassification andcompute diversitymetrics• Quality plots and read trimming– FastQChttp://www.bioinformatics.babraham.ac.uk/projects/fastqc/– FASTXhttp://hannonlab.cshl.edu/fastx_toolkit/
  20. 20. WGS sequence processing workflowFilter for lowquality readsAssemblereadsPerform taxonomicclassification andcompute diversitymetrics• NGS assembly with uneven depth– IDBA-UDhttp://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/– MIRAhttp://www.chevreux.org/projects_mira.html– Velvet / MetaVelvethttp://www.ebi.ac.uk/~zerbino/velvet/http://metavelvet.dna.bio.keio.ac.jp/
  21. 21. WGS sequence processing workflowFilter for lowquality readsAssemblereadsPerform taxonomicclassification andcompute diversitymetrics• Hybrid composition/homology basedclassifiers– FCPhttp://kiwi.cs.dal.ca/Software/FCP– Phymm/PhymmBLhttp://www.cbcb.umd.edu/software/phymm/– AMPHORA2http://wolbachia.biology.virginia.edu/WuLab/Software.html– NBChttp://nbc.ece.drexel.edu/– MEGANhttp://ab.inf.uni-tuebingen.de/software/megan/
  22. 22. WGS sequence processing workflowFilter for lowquality readsAssemblereadsPerform taxonomicclassification andcompute diversitymetrics• Web based classifiers– MG-RASThttp://metagenomics.anl.gov/– CAMERAhttp://camera.calit2.net/– IMG/Mhttp://img.jgi.doe.gov/cgi-bin/m/main.cgi
  23. 23. MetaPhAln• Unique clade-specific markers for sequenced bacteria and archaea• 400 genuses/4000 genomes including HMP genomes• Species level resolution• MetaPhAln 2 in the works– Eukaryotes including Fungi– Viruses– Higher coverage of archaea• Krona and GraphAln for visualizationof output• Websites– https://bitbucket.org/nsegata/metaphlan– http://huttenhower.sph.harvard.edu/metaphlan
  24. 24. PhyloSift/pplacer• Reference database of marker genes• Places reads on tree of life based on homology toreference protein• Integration with metAMOS for pre-assembling next-generation datasets• Bacterial and Archaeal classification only• Plant and Fungi marker genes are being added• Websites– http://phylosift.wordpress.com/– https://github.com/gjospin/PhyloSift
  25. 25. Real cost of Sequencing!!Sboner, Genome Biology, 2011
  26. 26. AcknowledgementsFundingMagdalen LindebergCornell UniversityDave SchneiderUSDA-ARS, IthacaCitrus greening / Wolbachia (wACP)
  27. 27. Thank you!Surya Saha ss2489@cornell.eduSuggestions• Plan informatics workflow as early as possible• Incorporate statistics at different stages in the workflow

×