[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes

  • 1,284 views
Uploaded on

Invited lecture at University of Vienna on extracting genomes from metagenomes.

Invited lecture at University of Vienna on extracting genomes from metagenomes.

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,284
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
39
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro
  • Intro

Transcript

  • 1. Extracting genomes from metagenomes Mads Albertsen PhD Student (2011-2014) 02-12-2013 @ University of Vienna CENTER FOR MICROBIAL COMMUNITIES
  • 2. Aalborg Per H. Nielsen
  • 3. Microbial Ecology: Who - when, where and why? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 4. Biological wastewater treatment Sewerage system Occasional breakdowns Microbial Ecology Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY1/13
  • 5. Hjørring Aalborg Århus Copenhagen MiDAS Odense Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 Since 2006 4 samples / year = 7 2 samples / year = 6 Some years = 16 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 6. qFISH 30 abundant core genera in all Danish EBPR WWTPs Functional studies using MAR-FISH Nielsen et al., 2012 Curr. Opin. Biotechnol. 23:452-9 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 7. www.midasfieldguide.org CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 8. Understanding ecosystems Metabolites Meta-bolomics Proteins Omics mRNA Meta-proteomics Meta-transcriptomics DNA In Situ methods Community structure Microbial functions Meta-genomics Microbial needs P-Removal: N-Removal: -Removal: Foaming: Ethanol production: Albertsen et al., 2012, ISME J 6: 1094-106 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 9. Understanding ecosystems Metabolites Meta-bolomics Proteins Omics mRNA Meta-proteomics Meta-transcriptomics DNA In Situ methods Meta-genomics Omics requires good reference genomes! Community structure Microbial functions Microbial needs P-Removal: N-Removal: -Removal: Foaming: Ethanol production: Albertsen et al., 2012, ISME J 6: 1094-106 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 10. Available genomes (+) (+) Albertsen et al., 2012, ISME J 6: 1094-106 (+) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 11. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Tetrasphaera: Kristiansen et al., 2013, ISME J 7: 543-54 Microthirx: McIllroy et al., 2013, ISME J 7:1161-72 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 12. How do we get the genomes? What you think you study What you actually study CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 13. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Single cell genomics Only routinely performed in specialized labs Very incomplete genomes (mean 40%, range 10-90%) www.bigelow.org CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 14. How do we get the genomes? Culturing Few microorganisms can be easily cultured (<<5%) Single cell genomics Only routinely performed in specialized labs Very incomplete genomes (mean 40%, range 10-90%) www.bigelow.org Metagenomics CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 15. What is a genome? Genome = Parts list of a single species CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 16. What is a metagenome? Photo: D. Kunkel; color, E. Latypova Metagenome = Parts list of the community CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 17. What is a metagenome? ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 18. Metagenomics is hot! ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 PubMed: metagenom*[Title/Abstract] CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 19. Sequencing is cheap! ”...functional analysis of the collective genomes of soil microflora, which we term the metagenome of the soil.” - J. Handelsman et al., 1998 Sequencing costs PubMed: metagenom*[Title/Abstract] http://www.genome.gov/sequencingcosts/ CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 20. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 21. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp Phylogenetic classification Who is there? Bacterium A Bacterium B ... Bacterium X Functional classification What can they do? Gene A Gene B ... Gene X CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 22. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) Assembly 100-150 bp Search against database Contigs 1000+ bp Phylogenetic classification Who is there? Bacterium A Bacterium B ... Bacterium X Functional classification What can they do? Omics requires good reference genomes! Gene A Gene B ... Gene X CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 23. ”If you want to understand the ecosystem you need to understand the individual species in the ecosystem” CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 24. Metagenomics Lion + Eagle ≠ Flying Lion CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 25. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 26. Metagenomics Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp 1. Micro-diversity 2. Separation of genomes (Binning) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 27. Micro-diversity Not 1 strain AAAAAAAAAAAAAA AAAAAAAAATAAAA AAAAAAAAACAAAA What you get TAAAA Assembly AAAAAAAAA AAAAA CAAAA Many closely related strains CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 28. Micro-diversity High micro-diversity Low micro-diversity Short term enrichment CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 29. Binning Reads DNA extraction Sequencing 100++ Abundant species (≈3 Mbp each) 100-150 bp Assembly Why not full genomes? Contigs 1000+ bp 1. Micro-diversity 2. Separation of genomes (Binning) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 30. Binning PhD student ”Binning” Complex sample CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 31. Binning Genomic signatures (e.g GC and codon usage ) Tetranucleotide frequency + statistical method PhD student ”Binning” Complex sample CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 32. Binning Genomic signatures (e.g GC and codon usage ) Tetranucleotide frequency + statistical method PhD student ”Binning” Complex sample Short pieces of DNA sequences (1-10kbp) Local sequence divergence CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 33. ”Metagenomics can be used to measure the abundance of the organims in the original sample.” CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 34. Binning Original sample Sequencing Metagenome reads Assembly Abundance Scaffolds Mapping 3x 1x 1x CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 35. Binning Original sample Sequencing Metagenome reads Assembly Abundance Scaffolds Mapping 3x 1x 1x CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 36. Binning Abundance Sequence composition-independent binning Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 37. Binning Abundance Abundance Sequence composition-independent binning Sample 1 Sample 2 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 38. Binning Abundance Abundance Sequence composition-independent binning Sample 2 Abundance Sample 2 Sample 1 Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 39. Binning 1. Reduce micro-diversity Abundance Sample 2 2. Use multiple related samples Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 40. Binning 1. Reduce micro-diversity Abundance Sample 2 2. Use multiple related samples Abundance Sample 2 Abundance Sample 1 Abundance Sample 1 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 41. Binning • Nitrospira enrichment running for years • 3 dominant species • No micro-diversity H. Daims & C. Dorninger, DOME, University of Vienna CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 42. SBR reactor Full-scale EBPR plant Short term enrichment Days Albertsen et al., 2013 Nat. Biotech. 1. Reduction of (micro)-diversity CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 43. SBR reactor Full-scale EBPR plant Short term enrichment 2. Two different DNA extraction methods Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 44. Colored using a set of 100 phylogenetic marker genes Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 45. Colored using a set of 100 phylogenetic marker genes TM7-1 (1.6%) TM7-2 (0.7%) TM7-3 (0.2%) TM7-4 (0.06%) Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 46. Colored using a set of 100 phylogenetic marker genes Zoom on target TM7-2 (0.7%) Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 47. Colored using a set of 100 phylogenetic marker genes Zoom on target PCA on genomic signatures PC2 TM7-2 (0.7%) TM7-2 PC1 Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 48. Colored using a set of 100 phylogenetic marker genes Candidatus Saccharimonas aalborgensis TM7-1 (1.6%) Candidate phylum TM7 Saccharibacteria Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 49. Genome validation Assembly inspection Essential single copy genes Genes (HMM models) Phyla Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 50. In situ confirmation PL. Larsen, SJ. McIllroy
  • 51. Multi-metagenome http://madsalbertsen.github.io/multi-metagenome/ Short: goo.gl/0ctA3 • • • • • Guides Workflow scripts Example data All the code Reccomendations R markdown enables reproducible and transparent genome extractions Albertsen et al., 2013 Nat. Biotech. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 52. It’s just a potential! ..and a poor description of it. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 53. Competibacter GAO989 Competibacter has the potential to negatively influence phosphorus removal in wastewater treatment. Litterature disagreement on glycolytic pathways with consequences for modeling. McIlroy and Albertsen et al., 2013, ISME J (AOP). Candidatus Competibacter odensis (44%) CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 54. Competibacter FISH with Competibacter specific probe McIlroy and Albertsen et al., 2013, ISME J (AOP). MAR with H3-labeled glucose CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 55. Obtaining genomes is easy… … but they are useless without high quality annotations, in situ validations and good questions! CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 56. Questions? ma@bio.aau.dk @MadsAlbertsen85 MadsAlbertsen Per H. Nielsen Simon J. McIllroy Søren M. Karst EB group University of Queensland C. Dorringer H. Daims G.W. Tyson P. Hugenholtz University of Vienna CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 57. Databases Contigs Databases Annotated metagenome ...you only see what is in the database CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 58. What is in the databases? Finshed Genomes in IMG Vs. Greengenes 16S rRNA database Genomes 16S Phyla 29 90 Class 46 249 Order 100 405 Species 1268 99322* *97% clustering Note: only including 1 strain pr. species CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 59. MG-RAST example Contigs 650.000 EBPR proteins with taxonomy assigned How similar are they to the genomes in the database? CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 60. Sludge microbes vs. Database genomes 650.000 EBPR proteins Note: not abundance weighted CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 61. Sludge microbes vs. Database genomes 650.000 EBPR proteins 1.260.000 Human gut Qin et al., 2010 Nature RAST ID: 4448044.3 Note: not abundance weighted CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 62. Sludge microbes vs. Database genomes The 7 genera with most EBPR proteins assigned CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 63. Effect of missing genomes What is the effect of not having closely related genomes in the database? 1. Remove a genome from the database 2. Search the removed genome against the database CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 64. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins Best hit Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 65. Effect of missing genomes Accumulibacter phosphatis blastp Azoarcus 4326 proteins Best hit Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 66. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins MEGAN LCA Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Assigned to Proteobacteria Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 67. Effect of missing genomes Accumulibacter phosphatis blastp 4326 proteins MEGAN LCA Lowest common ancester (LCA) approach: Hit 1: Beta-proteobacteria 80% ID Hit 2: Gamma-proteobacteria 79% ID Hit 3: Actinobacteria 59% ID Bacteria 325 Beta- 853 Genus 4326 proteins: • 27% correctly classified on genus level • 54% not assigned the correct class • 101 genera identified Rhodocyclaceae 1149 Assigned to Proteobacteria Proteobacteria 860 Related genomes Bacteria 1268 Proteobacteria 564 Betaproteobacteria 84 Rhodocyclales 5 Rhodocyclaceae 5 No hits 261 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 68. Effect of missing genomes Phylum Nitrospira defluvii blastp 4268 proteins: • 1% correctly classified on phylum level MEGAN LCA Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 69. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG What about function? Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 70. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 71. Effect of missing genomes Nitrospira defluvii blastp MEGAN LCA + KEGG Related genomes Bacteria Nitrospirae 1268 3 CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 72. Implication of missing genomes Function A Function B Function C Function D CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
  • 73. Pitfalls You always get billions of data! CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY