Comparative metagenomicsquantifying similarities between         environments             Bas E. Dutilh     CPHx, Copenhag...
Metagenomic analysis tools
Taxonomic or functional profiles              Trindade-Silva et al. PLoS ONE 2012                       Kip et al. Env. Mi...
Clustering profiles• Calculate pairwise distances   – Manhattan distance   – Correlation between profiles      • High corr...
Microbiomes of water animals- BlastN reads against Genbank- Taxonomic profiles including parent clades- Wootters distance ...
Many unknowns in viral metagenomes                        Mokili et al. Curr. Opin. Virology 2012
Highly divergent samples                                                                                               100...
Human microbiota well characterized*         * The terrestrial hot spring metagenomes consist of 99.8% reads from Synechoc...
Reference-independent methods• k-mer profiles    GATGGATGAC        0    AAAA                                       ...    ...
Cross-assembly• Combine sequencing reads from different  metagenomes in a single assembly  – Use your favorite assembly to...
http://edwards.sdsu.edu/crass/                            Dutilh et al. submitted
2 or 3 samples compared                          Dutilh et al. submitted
4 or more samples compared• Clustering:  – Calculate distance measure     • Correct for metagenome size     • Correct for ...
Similar numbers of utilized reads        human           water10090                       crAss80                       Bl...
Simulated metagenomes0%     30%       60%      90%10%    40%       70%      100%20%    50%       80%                      ...
Acknowledgements•   Robert Schmieder•   Jim Nulton•   Ben Felts•   Peter Salamon•   Robert A. Edwards•   John L. Mokili
Upcoming SlideShare
Loading in …5
×

Comparative metagenomics: quantifying similarities between environments, CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Nijmegen, Bas E. Dutilh, Copenhagenomics 2012

1,547 views

Published on

Comparative metagenomics: quantifying similarities between environments

Published in: Education, Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,547
On SlideShare
0
From Embeds
0
Number of Embeds
123
Actions
Shares
0
Downloads
0
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Comparative metagenomics: quantifying similarities between environments, CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Nijmegen, Bas E. Dutilh, Copenhagenomics 2012

  1. 1. Comparative metagenomicsquantifying similarities between environments Bas E. Dutilh CPHx, Copenhagen, Denmark June 14th 2012
  2. 2. Metagenomic analysis tools
  3. 3. Taxonomic or functional profiles Trindade-Silva et al. PLoS ONE 2012 Kip et al. Env. Microbiol. Rep. 2011 Boleij et al. Mol. Cell. Proteomics 2012
  4. 4. Clustering profiles• Calculate pairwise distances – Manhattan distance – Correlation between profiles • High correlation ↔ similar environment • Low correlation ↔ dissimilar environment frequency → – Angle between vectors in n-dimensional space • Small angle ↔ similar environment • Large angle ↔ dissimilar environment taxa / functions → freq taxon 1 → freq taxon 1 → freq taxon 2 → freq taxon 2 → ... • Wootters distance between profiles• Create cladogram Wootters Phys. Rev. D 1981
  5. 5. Microbiomes of water animals- BlastN reads against Genbank- Taxonomic profiles including parent clades- Wootters distance formula- BioNJ cladogram Trindade-Silva et al. PLoS ONE 2012
  6. 6. Many unknowns in viral metagenomes Mokili et al. Curr. Opin. Virology 2012
  7. 7. Highly divergent samples 100 % reads used (BlastN mapping to Genbank) 90 80 70 60 50 40 30 20 10 0 - BlastN reads against Genbank - Taxonomic profiles including parent clades human water - Distance = 1 minus correlation - BioNJ cladogram Dutilh et al. submitted
  8. 8. Human microbiota well characterized* * The terrestrial hot spring metagenomes consist of 99.8% reads from Synechococcus
  9. 9. Reference-independent methods• k-mer profiles GATGGATGAC 0 AAAA ... → GATG 1 ATGA ATGG 1 ATGG TGGA → 2 GATG GGAT 1 GGAT GATG 1 TGAC ATGA ... TGAC 0 TTTT – 4k/2 entries (in this case 44/2 = 128) – Calculate profile similarities• Enhance with habitat k-mer signatures (HabiSign)• Advantages – Very fast to calculate• Disadvantages – A lot of information is lost – Biologically (rather) meaningless Ghosh et al. BMC Bioinformatics 2011
  10. 10. Cross-assembly• Combine sequencing reads from different metagenomes in a single assembly – Use your favorite assembly tool• Cross-contigs contain reads from more than 1 sample – Directly represent shared entities between samples• The number of reads assembled into cross-contigs determines the degree of overlap between samples Dutilh et al. submitted
  11. 11. http://edwards.sdsu.edu/crass/ Dutilh et al. submitted
  12. 12. 2 or 3 samples compared Dutilh et al. submitted
  13. 13. 4 or more samples compared• Clustering: – Calculate distance measure • Correct for metagenome size • Correct for contig length water – Create cladogram human Dutilh et al. submitted
  14. 14. Similar numbers of utilized reads human water10090 crAss80 BlastN70605040302010 0 Dutilh et al. submitted
  15. 15. Simulated metagenomes0% 30% 60% 90%10% 40% 70% 100%20% 50% 80% Dutilh et al. submitted
  16. 16. Acknowledgements• Robert Schmieder• Jim Nulton• Ben Felts• Peter Salamon• Robert A. Edwards• John L. Mokili

×