Successfully reported this slideshow.

[13.07.07] albertsen mewe13 metagenomics

6,046 views

Published on

Published in: Technology
  • Be the first to comment

[13.07.07] albertsen mewe13 metagenomics

  1. 1. Metagenomics - Potentials and pitfalls Mads Albertsen MEWE 2013 CENTER FOR MICROBIAL COMMUNITIES
  2. 2. Agenda Introduction Pitfalls Potentials Recommendations CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  3. 3. Introduction CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Genome = Parts list of a single genome
  4. 4. Introduction CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Metagenome = Parts list of the community Photo: D. Kunkel; color, E. Latypova
  5. 5. Introduction ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  6. 6. Introduction PubMed: metagenom*[Title/Abstract] ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  7. 7. Introduction CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 PubMed: metagenom*[Title/Abstract] Sequencing costs http://www.genome.gov/sequencingcosts/
  8. 8. Introduction CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Metagenomics ≠ Amplicon sequencing
  9. 9. Sequencing and assembly CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY ≈3.000.000 bp pr. genome ≈1000 bp+ contigs 150 bp reads
  10. 10. Assigning information CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Contigs Function Taxonomy Databases Binning
  11. 11. What have metagenomics been used for? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Rusch et al., 2007 Plos Biology Exploration Qin et al., 2010 Nature • 6.3 Gbp of sequence (2x Human genomes, 2000 x Bacterial genomes) • Most sequences were novel compared to the databases • 127 Human gut metagenomes • 600 Gbp sequence (200 x Human genomes) • 3.3 million genes identified • Minimal gut metagenome definded
  12. 12. What have metagenomics been used for? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY • A characteristic microbial fingerprint for each of the nine different ecosystem types Dinsdale et al., 2008 Nature Comparative Specific functions Hess et al., 2011 Science • Identified 27.755 putative carbohydrate-active genes from a cow rumen metagenome • Expressed 90 candidates of which 57% had enzymatic activity against cellulosic substrates
  13. 13. What have metagenomics been used for? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY • Genome extraction from low complexity metagenome • Candidatus Accumulibacter phosphatis • The first genome of a polyphosphate accumulating organism (PAO) with a major role en enhanced biological phosphorus removal Extracting genomes • Genome extraction of low abundant species (< 0.1%) from metagenomes • First complete TM7 genome • Access to genomes of the ”uncultured majority” Garcia Martin et al., 2006 Nat. Biotechnol. Albertsen et al., 2013 Nat. Biotechnol.
  14. 14. Pitfalls CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  15. 15. Metagenomics made easy CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Great resources – but use with care
  16. 16. MG-RAST example CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Contigs
  17. 17. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Dataset overview
  18. 18. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY FunctionTaxonomy Taxonomy and Function overview
  19. 19. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Compare with other samples Samples Functional categories
  20. 20. Pitfalls CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY You always get billions of data!
  21. 21. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Pitfalls Is your DNA extraction OK? ... and the samples you want to compare with? Did you sequence enough? Did you know the GC bias of your protocol? Did you normalize for sequencing depth? Did you use the same sequencing platform? Assembly = data not quantitative! Are you comparing assembled data with reads?
  22. 22. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Databases Contigs Databases ...you only see what is in the database Annotated metagenome
  23. 23. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY What is in the databases? Phyla Class Order Species 29 46 100 1268 90 249 405 99322 Genomes 16S Finshed Genomes in IMG Vs. Greengenes 16S rRNA database Note: only including 1 strain pr. species *97% clustering *
  24. 24. MG-RAST example CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Contigs 650.000 EBPR proteins with taxonomy assigned How similar are they to the genomes in the database?
  25. 25. Sludge microbes vs. Database genomes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY 650.000 EBPR proteins Note: not abundance weighted
  26. 26. Sludge microbes vs. Database genomes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY 650.000 EBPR proteins 1.260.000 Human gut Qin et al., 2010 Nature RAST ID: 4448044.3 Note: not abundance weighted
  27. 27. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Sludge microbes vs. Database genomes The 7 genera with most EBPR proteins assigned
  28. 28. Effect of missing genomes What is the effect of not having closely related genomes in the database? 1. Remove a genome from the database 2. Search the removed genome against the database CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  29. 29. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes Best hit Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 Accumulibacter phosphatis blastp Related genomes 4326 proteins
  30. 30. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes Best hit Accumulibacter phosphatis blastp Related genomes 4326 proteins Azoarcus Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5
  31. 31. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes MEGAN LCA Accumulibacter phosphatis blastp Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Assigned to Proteobacteria Related genomes 4326 proteins Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5
  32. 32. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes MEGAN LCA Accumulibacter phosphatis blastp Genus No hits 261 Bacteria 325 Proteobacteria 860 Beta- 853 Rhodocyclaceae 1149 4326 proteins: • 27% correctly classified on genus level • 54% not assigned the correct class • 101 genera identified Related genomes Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Assigned to Proteobacteria 4326 proteins Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5
  33. 33. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes MEGAN LCA Nitrospira defluvii Bacteria 1268 Nitrospirae 3 blastp Related genomes 4268 proteins: • 1% correctly classified on phylum level Phylum
  34. 34. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes MEGAN LCA + KEGG Nitrospira defluvii blastp Related genomes Bacteria 1268 Nitrospirae 3 What about function?
  35. 35. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes MEGAN LCA + KEGG Nitrospira defluvii blastp Related genomes Bacteria 1268 Nitrospirae 3
  36. 36. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Effect of missing genomes Nitrospira defluvii blastp Related genomes MEGAN LCA + KEGG Bacteria 1268 Nitrospirae 3
  37. 37. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Implication of missing genomes Function A Function B Function C Function D
  38. 38. Pitfalls CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY You always get billions of data!
  39. 39. Potentials CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  40. 40. Potentials CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY 1. Hunting novel antibiotic resistance genes 2. Extracting genomes from metagenomes
  41. 41. Hunting novel antibiotic resistance genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY What if you want to find something that is not in the database?
  42. 42. Hunting novel antibiotic resistance genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Functional metagenomics M. Sommer, DTU, Denmark (in prep)
  43. 43. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Hunting novel antibiotic resistance genes 89 different antibiotic resistance genes 19 novel M. Sommer, DTU, Denmark (in prep)
  44. 44. Hunting novel antibiotic resistance genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY How abundant are the antibiotic genes in the environment?
  45. 45. Hunting novel antibiotic resistance genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY The number of metagenome reads reflect the abundance of the bacteria. Bacteria Reads
  46. 46. Hunting novel antibiotic resistance genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Bacteria Reads
  47. 47. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Hunting novel antibiotic resistance genes Bacteria Reads
  48. 48. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Hunting novel antibiotic resistance genes Metagenomes Antibioticgenes 89 different antibiotic resistance genes M. Sommer, DTU, Denmark (in prep)
  49. 49. Extracting genomes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  50. 50. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY ≈3.000.000 bp pr. genome ≈1000 bp+ contigs 150 bp reads Why not full genomes? Extracting genomes
  51. 51. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY ≈3.000.000 bp pr. genome ≈1000 bp+ contigs 150 bp reads Why not full genomes? 1. Micro-diversity 2. Separation of genomes (Binning) Extracting genomes
  52. 52. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Not 1 strain Many closely related strains AAAAAAAAAAAAAA AAAAAAAAATAAAA AAAAAAAAACAAAA AAAAAAAAA TAAAA CAAAA What you get AAAAA Assembly Extracting genomes
  53. 53. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Extracting genomes Metagenome assembly is not quantitative!
  54. 54. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Reduce microdiversity Low micro-diversityHigh micro-diversity Short term enrichment
  55. 55. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY ≈3.000.000 bp pr. genome ≈1000 bp+ contigs 150 bp reads Why not full genomes? 1. Micro-diversity 2. Separation of genomes (Binning) Extracting genomes
  56. 56. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning Genomic signatures: - GC / Codon usage - Tetranucleotide frequency + statistical method Complex sample PhD student ”Binning”
  57. 57. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning Genomic signatures: - GC / Codon usage - Tetranucleotide frequency + statistical method Complex sample PhD student ”Binning” Problems: - Short pieces of sequence (1-10kbp) - Local sequence divergence
  58. 58. Sequence composition-independent binning Sample 1 Abundance Sample 2 Abundance CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning
  59. 59. Sequence composition-independent binning Sample 1 Sample 2 Abundance Sample 1 AbundanceSample2 Abundance Abundance CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning
  60. 60. 1. Reduce micro-diversity 2. Use multiple related samples Abundance Sample 1 AbundanceSample2 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning
  61. 61. 1. Reduce micro-diversity 2. Use multiple related samples Abundance Sample 1 AbundanceSample2 Abundance Sample 1 AbundanceSample2 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Binning
  62. 62. Simple reactors CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYH. Daims & C. Dorninger, DOME, University of Vienna • Nitrospira enrichment running for years • 3 dominant species • No micro-diversity
  63. 63. Short term enrichment Full-scale EBPR plant SBR reactor Days 1. Reduction of (micro)-diversity Competibacter CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  64. 64. Short term enrichment Full-scale EBPR plant SBR reactor 2. Two different DNA extraction methods CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  65. 65. Colored using a set of 100 phylogenetic marker genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  66. 66. Colored using a set of 100 phylogenetic marker genes TM7-1 (1.6%) TM7-2 (0.7%) TM7-3 (0.2%) TM7-4 (0.06%) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  67. 67. Zoom on target TM7-2 (0.7%) Colored using a set of 100 phylogenetic marker genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  68. 68. Zoom on target PC2 PC1 TM7-2 PCA on genomic signatures TM7-2 (0.7%) Colored using a set of 100 phylogenetic marker genes CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  69. 69. Colored using a set of 100 phylogenetic marker genes TM7-1 (1.6%) Candidate phylum TM7 Saccharibacteria Candidatus Saccharimonas aalborgensis CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech.
  70. 70. Candidatus Competibacter denitrificans (10.6%) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech. Poster by S. McIlroy
  71. 71. Genome assembly validation CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech. Phyla Genes (HMM model) Essential single copy genesAssembly inspection
  72. 72. Multi-metagenome CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITYAlbertsen et al., 2013 Nat. Biotech. http://madsalbertsen.github.io/multi-metagenome/ Short: goo.gl/0ctA3 • Guides • Workflow scripts • Example data • All the code • Reccomendations
  73. 73. Multi-metagenome CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Highly complex environments... ...add more samples! Talk by SM. Karst
  74. 74. Potentials CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY Metabolites Proteins mRNA DNA Meta-bolomics Meta-proteomics Meta-transcriptomics Meta-genomics In Situ methods Community structure Microbial functions Extraction P-Removal: N-Removal: -Removal: Foaming: Ethanol production: Microbial needs
  75. 75. Recommendations • Do you really need metagenomics? • Are the databases usefull in your environment? • Unless human related they are not... • Metagenomics is just the parts list ... of the DNA that could be extracted ... and the functions that could be annotated • Validation, validation validation! • Bioinformatic • In situ • Genome extraction from simple reactors is possible • Enables comprehensive transcriptomics CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  76. 76. Metagenomics is pretty... ...but not always informative

×