Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

6,094 views

Published on

UEB-VHIR's Metagenomics Training. Session 1. 2013/08/26. An Introduction to Metagenomics Data Analysis. Ferran Briansó (ferran.brianso@vhir.org)

Published in: Education, Technology
1 Comment
5 Likes
Statistics
Notes
  • Very nice presentation!

    For sample size/power calculations and biostatistical analysis I would like to refer readers to our open source paper

    http://www.plosone.org/article/info:doi/10.1371/journal.pone.0052078

    which is based on the Dirichlet-multinomial model. A second paper for analyzing taxonomic trees is at

    http://www.ploscollections.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048996

    Both papers link to open source software.

    As statisticians we have a different focus than bioinformatics and evolutionary/ecological software.

    Feel free to contact us with questions

    Bill Shannon, PhD, MBA
    william.shannon@sbcglobal.net
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,094
On SlideShare
0
From Embeds
0
Number of Embeds
278
Actions
Shares
0
Downloads
258
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013

  1. 1. An Introduction toAn Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis Metagenomics TrainingMetagenomics Training Ferran BriansóFerran Briansó VHIR - 26/08/2013 ferran.brianso@vhir.orgferran.brianso@vhir.org
  2. 2. OutlineOutline  Introduction to Metagenomics  Basic Terminology  Computational Approaches & Tools  Whole Genome Shotgun  16S/ITS Community Surveys  Recommended Tools  MEGAN  mothur  QIIME  AXIOME & CloVR
  3. 3. Introduction to METAGENOMICSMETAGENOMICS
  4. 4. IntroductionIntroduction First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome. Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products". Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143
  5. 5. First use of the term metagenome, referencing the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome. “The application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.” Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998). "Molecular biological access to the chemistry of unknown soil microbes: A new frontier for natural products". Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9. PMID 9818143 Chen, K.; Pachter, L. (2005). "Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities". PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024 IntroductionIntroduction
  6. 6. Source: US Division of Earth & Life Studies of the National Academies http://dels-old.nas.edu/metagenomics/overview.shtml IntroductionIntroduction
  7. 7. Source: US Division of Earth & Life Studies of the National Academies http://dels-old.nas.edu/metagenomics/overview.shtml IntroductionIntroduction
  8. 8. Source: IntroductionIntroduction
  9. 9. Source: Feng Chen, JGI IntroductionIntroduction Perfomance Comparison for (some) Platforms
  10. 10. Basic TERMINOLOGYTERMINOLOGY
  11. 11. TerminologyTerminology  Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.  Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).  OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.
  12. 12.  Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.  Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).  OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.  Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence. TerminologyTerminology
  13. 13.  Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.  Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).  OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.  Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.  Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.  Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.  Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448). TerminologyTerminology
  14. 14.  Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from automated DNA sequencers prior to sequence assembly and other downstream uses.  Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).  OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study. Typically using a percent sequence similarity threshold for classifying microbes within the same, or different, OTUs.  Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a template derived from a different but similar sequence. This then acts as a primer that is extended to form a chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998, Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological sequence.  Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e., species richness) in that ecosystem, or by one or more diversity indices.  Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species change between the ecosystems.  Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity according to Hunter (2002:448).  Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the number of samples. TerminologyTerminology
  15. 15. Computational APPROACHES & TOOLSAPPROACHES & TOOLS
  16. 16. Approaches & ToolsApproaches & Tools
  17. 17. Approaches & ToolsApproaches & Tools
  18. 18. Approaches & ToolsApproaches & Tools
  19. 19. Approaches & ToolsApproaches & Tools
  20. 20. Whole Genome SHOTGUNSHOTGUN
  21. 21. Whole Genome ShotgunWhole Genome Shotgun
  22. 22. WGS WorkflowWGS Workflow
  23. 23. WGS WorkflowWGS Workflow
  24. 24. WGS WorkflowWGS Workflow
  25. 25. WGS WorkflowWGS Workflow
  26. 26. Examples of WGS ToolsExamples of WGS Tools
  27. 27. Examples of WGS ToolsExamples of WGS Tools
  28. 28. Analysis of 16S/ITS16S/ITS Community SurveysCommunity Surveys
  29. 29. 16S/ITS community surveys16S/ITS community surveys
  30. 30. 16S/ITS issues16S/ITS issues
  31. 31. 16S/ITS workflow16S/ITS workflow
  32. 32. 16S/ITS workflow16S/ITS workflow
  33. 33. 16S/ITS workflow16S/ITS workflow
  34. 34. 16S/ITS workflow16S/ITS workflow
  35. 35. Some recommended ToolsTools
  36. 36. Some (recommended) ToolsSome (recommended) Tools mothur MEGAN
  37. 37. MEGANMEGAN 2007 → 2011 → ... ... 2012 →
  38. 38. MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA
  39. 39. MEGAN 4 for 16S rRNAMEGAN 4 for 16S rRNA
  40. 40. mothurmothur 2009 →
  41. 41. mothurmothur 2009 →
  42. 42. QIIMEQIIME
  43. 43. Integrative Tools/PlatformsTools/Platforms
  44. 44. AXIOMEAXIOME
  45. 45. AXIOMEAXIOME
  46. 46. AXIOMEAXIOME
  47. 47. CloVRCloVR http://www.edgebio.com
  48. 48. CloVRCloVR http://www.edgebio.com http://clovr.org
  49. 49. CloVRCloVR
  50. 50. CloVRCloVR
  51. 51. CloVRCloVR
  52. 52. CloVRCloVR
  53. 53. CloVRCloVR
  54. 54. Ferran BriansóFerran Briansó MGTraining 26/08/2013 Thanks for your attentionThanks for your attention ferran.brianso@vhir.orgferran.brianso@vhir.org An Introduction toAn Introduction to Metagenomics Data AnalysisMetagenomics Data Analysis more info at http://ueb.vhir.org/MGT

×