Assembly of metagenomes

4,343 views

Published on

A talk for I gave for the 2011 metagenomics course at the Biological Dept. Univ. of Oslo April 2011

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,343
On SlideShare
0
From Embeds
0
Number of Embeds
186
Actions
Shares
0
Downloads
135
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Assembly of metagenomes

  1. 1. Assembly of metagenomes<br />Lex Nederbragt<br />Norwegian Sequencing Center &<br />Centre for Ecological and Evolutionary Synthesis<br />University of Oslo<br />
  2. 2. What is assembly<br />From reads to genome<br />
  3. 3. Why assembly?<br />Wooley JC et al, PLoSComput Biol. 2010 Feb 26;6(2):e1000667<br />
  4. 4. How<br />Find overlap between reads<br />
  5. 5. How<br />Build consensus sequence<br />
  6. 6. Challenges<br />Repetitive element<br />DNA<br />Shotgun<br />reads<br />Shotgun reads<br />Contigs<br />Collapsed contig<br />
  7. 7. Results<br />Lots of pieces<br />
  8. 8. Mate pairs<br />
  9. 9. Assembly with mate pairs<br />Gaps<br />Paired reads<br />Contigs<br />Scaffold<br />
  10. 10. Mate pairs<br />Contig<br />Contig<br />Contig<br />Scaffold<br />NNNNN<br />NNNNN<br />
  11. 11. Mate pairs?<br />454/Illumina<br />Illumina<br />150– 600 bases<br />
  12. 12. Mate pairs!<br />Longer jumps:<br />
  13. 13. Mate pairs<br />Little used for metagenomics...<br />
  14. 14. Why is assembly hard for metagenomes?<br />Heterogeneous samples<br />many different genomes<br />overlap between genomes<br />e.g. 16S<br />Non-species-specific contigs<br />http://rna.ucsc.edu/<br />
  15. 15. When could it work<br />One or a few dominating species<br />contigs might be species-specific<br />
  16. 16. Specialized software<br />Genovo<br />
  17. 17. Specialized software<br />Genovo<br />Uses a 'generative probabilistic model' of read generation <br />Assembler discovers 'likely sequence reconstructions under the model'<br />
  18. 18. Use your favorite assembler<br />Newbler (454)<br />Velvet<br />Euler<br />SOAPdenovo<br />...<br />Tweak parameters<br />e.g. higher stringency for determining overlaps<br />
  19. 19. Check contigs for<br />Read depth<br />GC frequency<br />Tetranucleotide frequency<br />
  20. 20. Example<br />Read depth<br />
  21. 21. Challenges<br />Repetitive element<br />DNA<br />Shotgun<br />reads<br />Shotgun reads<br />Contigs<br />Collapsed contig<br />
  22. 22. Results<br />Lots of pieces<br />Higher read depth<br />Repetitive element<br />DNA<br />
  23. 23. Example<br />One contig<br />Log scale!<br />
  24. 24. Example<br />
  25. 25. Example<br />Caulobacteraceae<br />Proteobacteria<br />Cyanobacteria<br />Bacteroides<br />
  26. 26. Solution<br />Split contigs on<br />read depth<br />GC%<br />Use BLAST<br />
  27. 27. MetagenomicORFome Assembly<br />Gene/protein-<br />directed assembly<br />Ye Y, Tang H. 2009. J BioinformComputBiol 7: 455-471 <br />
  28. 28. Iterative read mapping and assembly <br />Align reads to a single reference genome<br />'Update' the reference <br />based on alignment<br />Align remaining reads again<br />Dutilh BE, Huynen MA, Strous M. 2009. Bioinformatics 25: 2878-2881.<br />
  29. 29. Reverse metagenomics<br />Leptospirillumgroup III never cultured<br />shotgun metagenomics<br /><ul><li>nitrogen fixation gene
  30. 30. GC content and read depth Leptospirillum group III</li></ul>Culturable for the first time<br />

×