[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes

4,695 views

Published on

Invited lecture at University of Vienna on extracting genomes from metagenomes.

Published in: Education, Technology
  • Be the first to comment

[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes

  1. 1. Extracting genomes from metagenomes Mads Albertsen PhD Student (2011-2014) 02-12-2013 @ University of Vienna CENTER FOR MICROBIAL COMMUNITIES
  2. 2. Aalborg Per H. Nielsen
  3. 3. Microbial Ecology: Who - when, where and why? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  4. 4. Biological wastewater treatment Sewerage system Occasional breakdowns Microbial Ecology Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY1/13
  5. 5. Hjørring Aalborg Århus Copenhagen MiDAS Odense Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 Since 2006 4 samples / year = 7 2 samples / year = 6 Some years = 16 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  6. 6. qFISH 30 abundant core genera in all Danish EBPR WWTPs Functional studies using MAR-FISH Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  7. 7. www.midasfieldguide.org CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  8. 8. Understanding ecosystems Metabolites Meta-bolomics Proteins Omics mRNA Meta-proteomics Meta-transcriptomics DNA In Situ methods Community structure Microbial functions Meta-genomics Microbial needs P-Removal: N-Removal: -Removal: Foaming: Ethanol production: Albertsen et al., 2012, ISME J 6: 1094-106 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  9. 9. Understanding ecosystems Metabolites Meta-bolomics Proteins Omics mRNA Meta-proteomics Meta-transcriptomics DNA In Situ methods Meta-genomics Omics requires good reference genomes! Community structure Microbial functions Microbial needs P-Removal: N-Removal: -Removal: Foaming: Ethanol production: Albertsen et al., 2012, ISME J 6: 1094-106 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  10. 10. Available genomes (+) (+) Albertsen et al., 2012, ISME J 6: 1094-106 (+) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  11. 11. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Tetrasphaera: Kristiansen et al., 2013, ISME J 7: 543-54 Microthirx: McIllroy et al., 2013, ISME J 7:1161-72 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  12. 12. How do we get the genomes? What you think you study What you actually study CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  13. 13. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Single cell genomics Only routinely performed in specialized labs Very incomplete genomes (mean 40%, range 10-90%) www.bigelow.org CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  14. 14. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Single cell genomics Only routinely performed in specialized labs Very incomplete genomes (mean 40%, range 10-90%) www.bigelow.org Metagenomics CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  15. 15. What is a genome? Genome = Parts list of a single species CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  16. 16. What is a metagenome? Photo: D. Kunkel; color, E. Latypova Metagenome = Parts list of the community CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  17. 17. What is a metagenome? ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  18. 18. Metagenomics is hot! ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 PubMed: metagenom*[Title/Abstract] CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  19. 19. Sequencing is cheap! ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 Sequencing costs PubMed: metagenom*[Title/Abstract] http://www.genome.gov/sequencingcosts/ CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  20. 20. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  21. 21. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp Phylogenetic classification Who is there? Bacterium A Bacterium B ... Bacterium X Functional classification What can they do? Gene A Gene B ... Gene X CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  22. 22. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp Phylogenetic classification Who is there? Bacterium A Bacterium B ... Bacterium X Functional classification What can they do? Omics requires good reference genomes! Gene A Gene B ... Gene X CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  23. 23. ”If you want to understand the ecosystem you need to understand the individual species in the ecosystem” CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  24. 24. Metagenomics Lion + Eagle ≠ Flying Lion CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  25. 25. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  26. 26. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp 1. Micro-diversity 2. Separation of genomes (Binning) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  27. 27. Micro-diversity Not 1 strain AAAAAAAAAAAAAA AAAAAAAAATAAAA AAAAAAAAACAAAA What you get TAAAA Assembly AAAAAAAAA AAAAA CAAAA Many closely related strains CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  28. 28. Micro-diversity High micro-diversity Low micro-diversity Short term enrichment CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  29. 29. Binning Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp 1. Micro-diversity 2. Separation of genomes (Binning) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  30. 30. Binning PhD student ”Binning” Complex sample CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  31. 31. Binning Genomic signatures (e.g GC and codon usage ) Tetranucleotide frequency + statistical method PhD student ”Binning” Complex sample CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  32. 32. Binning Genomic signatures (e.g GC and codon usage ) Tetranucleotide frequency + statistical method PhD student ”Binning” Complex sample Short pieces of DNA sequences (1-10kbp) Local sequence divergence CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  33. 33. ”Metagenomics can be used to measure the abundance of the organims in the original sample.” CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  34. 34. Binning Original sample Sequencing Metagenome reads Assembly Abundance Scaffolds Mapping 3x 1x 1x CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  35. 35. Binning Original sample Sequencing Metagenome reads Assembly Abundance Scaffolds Mapping 3x 1x 1x CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  36. 36. Binning Abundance Sequence composition-independent binning Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  37. 37. Binning Abundance Abundance Sequence composition-independent binning Sample 1 Sample 2 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  38. 38. Binning Abundance Abundance Sequence composition-independent binning Sample 2 Abundance Sample 2 Sample 1 Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  39. 39. Binning 1. Reduce micro-diversity Abundance Sample 2 2. Use multiple related samples Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  40. 40. Binning 1. Reduce micro-diversity Abundance Sample 2 2. Use multiple related samples Abundance Sample 2 Abundance Sample 1 Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  41. 41. Binning • Nitrospira enrichment running for years • 3 dominant species • No micro-diversity H. Daims & C. Dorninger, DOME, University of Vienna CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  42. 42. SBR reactor Full-scale EBPR plant Short term enrichment Days Albertsen et al., 2013 Nat. Biotech. 1. Reduction of (micro)-diversity CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  43. 43. SBR reactor Full-scale EBPR plant Short term enrichment 2. Two different DNA extraction methods Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  44. 44. Colored using a set of 100 phylogenetic marker genes Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  45. 45. Colored using a set of 100 phylogenetic marker genes TM7-1 (1.6%) TM7-2 (0.7%) TM7-3 (0.2%) TM7-4 (0.06%) Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  46. 46. Colored using a set of 100 phylogenetic marker genes Zoom on target TM7-2 (0.7%) Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  47. 47. Colored using a set of 100 phylogenetic marker genes Zoom on target PCA on genomic signatures PC2 TM7-2 (0.7%) TM7-2 PC1 Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  48. 48. Colored using a set of 100 phylogenetic marker genes Candidatus Saccharimonas aalborgensis TM7-1 (1.6%) Candidate phylum TM7 Saccharibacteria Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  49. 49. Genome validation Assembly inspection Essential single copy genes Genes (HMM models) Phyla Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  50. 50. In situ confirmation PL. Larsen, SJ. McIllroy
  51. 51. Multi-metagenome http://madsalbertsen.github.io/multi-metagenome/ Short: goo.gl/0ctA3 • • • • • Guides Workflow scripts Example data All the code Reccomendations R markdown enables reproducible and transparent genome extractions Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  52. 52. It’s just a potential! ..and a poor description of it. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  53. 53. Competibacter GAO989 Competibacter has the potential to negatively influence phosphorus removal in wastewater treatment. Litterature disagreement on glycolytic pathways with consequences for modeling. McIlroy and Albertsen et al., 2013, ISME J (AOP). Candidatus Competibacter odensis (44%) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  54. 54. Competibacter FISH with Competibacter specific probe McIlroy and Albertsen et al., 2013, ISME J (AOP). MAR with H3-labeled glucose CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  55. 55. Obtaining genomes is easy… … but they are useless without high quality annotations, in situ validations and good questions! CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  56. 56. Questions? ma@bio.aau.dk @MadsAlbertsen85 MadsAlbertsen Per H. Nielsen Simon J. McIllroy Søren M. Karst EB group University of Queensland C. Dorringer H. Daims G.W. Tyson P. Hugenholtz University of Vienna CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  57. 57. Databases Contigs Databases Annotated metagenome ...you only see what is in the database CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  58. 58. What is in the databases? Finshed Genomes in IMG Vs. Greengenes 16S rRNA database Genomes 16S Phyla 29 90 Class 46 249 Order 100 405 Species 1268 99322* *97% clustering Note: only including 1 strain pr. species CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  59. 59. MG-RAST example Contigs 650.000 EBPR proteins with taxonomy assigned How similar are they to the genomes in the database? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  60. 60. Sludge microbes vs. Database genomes 650.000 EBPR proteins Note: not abundance weighted CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  61. 61. Sludge microbes vs. Database genomes 650.000 EBPR proteins 1.260.000 Human gut Qin et al., 2010 Nature RAST ID: 4448044.3 Note: not abundance weighted CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  62. 62. Sludge microbes vs. Database genomes The 7 genera with most EBPR proteins assigned CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  63. 63. Effect of missing genomes What is the effect of not having closely related genomes in the database? 1. Remove a genome from the database 2. Search the removed genome against the database CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  64. 64. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins Best hit Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  65. 65. Effect of missing genomes Accumulibacter phosphatis blastp Azoarcus 4326 proteins Best hit Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  66. 66. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins MEGAN LCA Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Assigned to Proteobacteria Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  67. 67. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins MEGAN LCA Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Bacteria 325 Beta- 853 Genus 4326 proteins: • 27% correctly classified on genus level • 54% not assigned the correct class • 101 genera identified Rhodocyclaceae 1149 Assigned to Proteobacteria Proteobacteria 860 Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 No hits 261 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  68. 68. Effect of missing genomes Phylum Nitrospira defluvii blastp 4268 proteins: • 1% correctly classified on phylum level MEGAN LCA Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  69. 69. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG What about function? Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  70. 70. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  71. 71. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  72. 72. Implication of missing genomes Function A Function B Function C Function D CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  73. 73. Pitfalls You always get billions of data! CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY

×