This is a one-hour lecture about metagenomics, focusing on discovery of viruses and unknown sequence elements. It is part of a one-day workshop about metagenome assembly of crAssphage, a bacteriophage virus found in human gut. The hands-on workflow can be found at http://tbb.bio.uu.nl/dutilh/CABBIO/ and should be doable in one afternoon with supervision. There is also an iPython notebook about this here: https://github.com/linsalrob/CrAPy
11. Metagenomics 2.0: genomes from metagenomes
• Reference databases fail for most
environmental metagenomes
– “Dark matter”: sequences not in database
• Homology searches fail for many short
sequencing reads
– Fast read alignment tools place upper limit
on evolutionary distance
• Interpretation fails for taxonomic and
functional metagenomic profiles
– Do functions co-occur in a genome?
– To describe interactions between species
you need species/genome-level resolution
• Solution: assembly and binning of (draft)
genomes from metagenomes
12. “Depending on how they are viewed,
the unknowns can represent either a formidable challenge
or a treasure trove for virus discovery.”
Mokili, Rohwer and Dutilh Curr. Opin. Virology 2012
13. Unknowns in viral metagenomes
Mokili, Rohwer and Dutilh Curr. Opin. Virology 2012
16. 1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12
Numberofcontigs
Number of samples contributing reads to contig
De novo assembled contigs
6,988 de novo cross-contigs
17. Family 1 Family 2 Family 3
F1M F1T1 F1T2 F2M F2T1 F2T2 F3M F3T1 F3T2 F4M F4T1 F4T2
220 contigs present in ≥9 samples
Family 4
Averagedepth→
Samples →
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 101112