Pathway Signature Genes


Published on

In this presentation, the concept of pathway signature genes is explained. These signatures are used in predicting the abundance of pathways in a metagenomically sequenced community.
This presentation was given on the M3 Special Interest Group during the ISMB in Stockholm, 2009.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pathway Signature Genes

  1. 1. Pathway prediction in (meta)genomes usingPathway Signature Genes<br />Lucas Brouwers, MSc student, Nijmegen<br />
  2. 2. Goal<br />How to predict the metabolic capacity in a metagenomic sample, given incomplete data?Current practice:<br /><ul><li>Percentage of sequences assigned to pathways used for estimating pathway abundance</li></ul>2 out of 5 sequences map to pathway X:Abundance of X is 40%<br />We propose to identify the (metabolic) pathways present in a community, making fulluse of the metagenomic data available<br />
  3. 3. Approach<br />We use presence of OGs to predict presence of pathways<br />Some species have pathway X<br />X<br />X<br />X<br />X<br />OG A<br />OG C<br />X<br />OG B<br />PWY X<br />X<br />PresenceSignature<br />Weak Signature<br />AbsenceSignature<br />
  4. 4. Approach<br />630 species and their OGs<br />1,200 (metabolic) pathways<br />
  5. 5. Signature genes add information<br />Do the signatures add information on the presence of a pathway?<br />(after all some pathways are rare, others are ubiquitous) <br />Specific information defined as a difference in Shannon entropy: how much extra information does the presence of an OG give about the presence of a pathway?<br />
  6. 6. What makes a signature?<br />N = 29,661<br />N = 46,176,543<br />
  7. 7. Absence signatures in a pathway<br />COG0437 (221 species): Fe-S-cluster-containing hydrogenases<br />Formatedehydrogenase<br />Formylmethanofuran-dehydrogenase<br />Glycolaldehyde-dehydrogenase<br />Nitrate reductase<br />
  8. 8. Integrate signature scores with pathway prediction in metagenomes<br />For quantitative analysis of pathways in metagenomic samples:<br />Consider Chlorphyll biosynthesis:Sa: Average of all OG scores, of OGs in species without chlorophyll biosynthesis<br />Sp: Average of all OG scores, of OGs in species with …<br />Si: Average of all OG scores found in a metagenomic sample<br />Correcting for genome sizes becomes …<br />
  9. 9. Performance in sub-sampling<br /><ul><li> Sample percentage of OGs from every species in STRING
  10. 10. Predict percentage of species with pathway & compare with actual occurrence</li></li></ul><li>Application to metagenomes<br />Dinsdale, E. A., Edwards, R. A., Hall, D., Angly, F., Breitbart, M., Brulc, J. M., et al. (2008). Functional metagenomic profiling of nine biomes. Nature, 452 (7187), 629-32. <br />Different biomes: <br /><ul><li> Subterranean
  11. 11. Hypersaline
  12. 12. Freshwater
  13. 13. Fish
  14. 14. Coral
  15. 15. Marine</li></li></ul><li>Simulated metagenomes<br /><ul><li> Sample OGs according to species distribution in metagenome
  16. 16. Predict percentage of species with pathway & compare with actual occurrence</li></ul>Sampling 1%<br />
  17. 17. Pathway description of biome diversity<br />Pathway occurrence was predicted in 35 metagenomic datasets<br />Principal component analysis revealed the pathways responsible for differences between the biomes<br />
  18. 18. Conclusions<br />Pathway signature genes allow us to interpret biomes on a pathway level<br />Genes do not need to be part of a pathway to predict its presence<br />We can quantitatively and accurately describe the pathway content of a metagenome<br />
  19. 19. Acknowledgements<br />Bas E Dutilh<br />MartijnHuynen<br />