Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pizza club - May 2016 - Shaman

150 views

Published on

Prediction of bacteriophage-host relationships

Published in: Science
  • Be the first to comment

  • Be the first to like this

Pizza club - May 2016 - Shaman

  1. 1. Shaman Narayanasamy Eco-Systems Biology Group Supervisors: Paul Wilmes and Jorge Goncalves PHD-2014-1/7934898 Computational approaches to predict bacteriophage-host relationships Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, Bas E. Dulith Review Article FEMS Microbiology (9 December 2015) Computational Biology Pizza Club series: 25th May 2016
  2. 2. 2 Article overview • Metagenomics for identification of viral-host associations • Introduction of wet-lab methods • Focused on bacteriophages (phages) and bacterial interactions • Benchmark data: 820 bacteriophages, associated hosts and publicly available metagenomic datasets • Assessment of predictive power of in silico phage-host signals: – Abundance-based methods – Sequence homology based methods – Genetic homology – CRISPRs – Oligonucleotide profiles – Compositional based methods
  3. 3. 3 Introduction
  4. 4. 4 Introduction Infection! Membrane receptor Figure adapted and modified from Gelbart & Knobler et al. (2008)
  5. 5. 5 Introduction Infection! Resistance Defense!!! • Membrane receptor mutation • CRISPR-Cas • Restriction-modification Membrane receptor Figure adapted and modified from Gelbart & Knobler et al. (2008)
  6. 6. 6 Introduction Infection! Resistance Defense!!! • Membrane receptor mutation • CRISPR-Cas • Restriction-modification Membrane receptor Mutation Figure adapted and modified from Gelbart & Knobler et al. (2008)
  7. 7. 7 Introduction Infection! Resistance Fitness Defense!!! • Membrane receptor mutation • CRISPR-Cas • Restriction-modification Membrane receptor Mutation Figure adapted and modified from Gelbart & Knobler et al. (2008)
  8. 8. 8 Introduction Infection! Resistance Fitness
  9. 9. 9 Introduction Infection! Resistance Fitness
  10. 10. 10 Introduction Infection! Resistance Fitness
  11. 11. 11 Introduction Competition Infection! Resistance Fitness
  12. 12. Experimental approaches for phage isolation 12 • Spot and plaque assays • Liquid assays • Viral tagging • Microfluidic PCR • PhageFISH • Single cell sequencing • Hi-C sequencing
  13. 13. Spot and plaque assays 13 Requires • Pure culture of host • Pure/environmental culture of phage Disadvantages • Low throughput • Host isolation required Photo adapted and modified from http://www.slideshare.net/Adrienna/global-food-safety2013
  14. 14. Liquid assays 14 Requires • Pure culture of host • Pure culture of phage Disadvantages • Use of OD readout * • Low sensitivity (single endpoint values) * • Host and phage isolate required * Use redox dye, Omnilog platform and real-time/semiquantitative PCR Figure adapted and modified from Goldberg et al. (2014)
  15. 15. Viral tagging 15 Requires • Pure culture of host • Pure culture/environmental isolate of phages • Cell sorter (FACS..?) Disadvantages • Host isolate required Figure adapted and modified from http://jgi.doe.gov/dyeing-learn-marine-viruses/
  16. 16. Microfludic PCR 16 Requires • Environmental microbial community sample • PCR primers for target marker genes Disadvantages • Relies on marker genes for design of PCR primers Figure adapted and modified from Dang & Sullivan (2014)
  17. 17. PhageFISH 17 Figures adapted and modified from Dang & Sullivan (2014) and Allers et al. (2013) Requires • Environmental microbial community sample • PCR primers for target marker genes Disadvantages • Relies on marker genes for FISH probe design time
  18. 18. Single cell sequencing 18 Requires • Single microbial cell from environmental microbial community sample Disadvantages • Biased towards most abundant environmental microbe Figure adapted and modified from Lasken (2012)
  19. 19. Benchmark dataset 19 820 complete phage genomes Field: “host” 153 complete bacterial genomes NCBI RefSeq
  20. 20. Quality assessment of predictions: ROC curves 20 • Assessment of binary classifier (Host/Not Host) • Does not require cut-off value • Based on the rate of accumulation of true and false positives • True positive rate (Sensitivity), False positive rate (1-Specificity) TPr = TP/TP + FN FPr = TN/TN + FP
  21. 21. Computational methods for phage-host signal prediction 21 • Abundance profiles • Genetic homology • CRISPR • Exact matches • Oligonucleotide profiles
  22. 22. Abundance profiles 22 • Stern et al. (2012) – Good correlation of phage-host abundance across human gut microbiome (metagenomes) • Reyes et al. (2013) – 2/5 phages correspond to decrease in host abundance (mouse gut) • Nielsen et al. (2014) – Occurrence of phage like gene sets corresponding to host (bacterial) gene set – Includes known phage-host pairs • Dulith et al. (2014) • 22% metagenomic reads may be of phage origin • Lima-Mendez et al. (2015); TARA Oceon Survey Figure adapted and modified from Nielsen et al. (2014) and Edwards et al. (2015) • Improves with the availability of multiple samples from same/similar environments • High spatio/temporal stratification; will improve as publicly available metagenome collection increases • Time series datasets potentially used for time lagged associations • Complicated and non-linear dynamics incompatible with straightforward correlation • 12% correct identification of host
  23. 23. Genetic homology 23 • Phage-host homology is an indication of recent common ancestry, implying interaction • Host genes may benefit phages! • Auxilary metabolic genes • Modi et al. (2013) and Dulith et al . (2014) Figure adapted and modified from Edwards et al. (2015) • Amino acid based searches applicable for distantly related organisms (29.8%) • Nucleotide based searches more accurate (38.5%) • 30% host identified
  24. 24. 24 CRISPR-Cas Phage genome 2Phage genome 1 R R R RRRS1 S2 S5S3 S4 R: Repeat Sx: Spacers CRISPR Bacterial genome cas gene CRISPR
  25. 25. CRISPRs 25 • Studies: – Human gut microbiome; Stern et al. (2012), Minot et al. (2013) – Acidophilic biofilms; Andersson & Banfield (2008) – Cow rumen; Berg Miller et al. (2012) – Arctic glacial ice and soil; Sanguino et al. (2015) – Marines environments; Anderson, Brazelton & Baross (2011), Cassman et al. (2012) – Activated sludge; Narayanasamy et al. (unpublished) • Little to no homology to known sequence • Environmentally dependent • Spacers are rapidly replaced • Most suitable for recent phage-host interactions • Not all prokaryotes encode CRISPRs (bacteria; 48 ± 30%, archaea; 63 ± 30%) • Highly specific, but not sensitive • Degeneracy of up to 13 mismatches allowed (Fineran et al., 2014) Figure adapted and modified from Edwards et al. (2015)
  26. 26. Exact matches 26 • Integration of phage to host via homologous recombination • attp (POP’) on phage genome and attb (BOB’) on bacterial genome • Common identical core sequence (2-15 bp) between phage and host • Adjacent to integrase gene in phage genome, near tRNA gene in bacterial genomes Figure adapted and modified from Edwards et al. (2015) • Longer matches more reliable • Up to 40% matches correct prediction
  27. 27. Contig with cas gene Contig with known phage gene Contig with CRISPR locus Oligonucleotide profiles 27 • Phages ameliorate genomic oligonucleotides profiles according to host • Avoid recognition by restriction enzymes • Adjustment of codon usage to match available host tRNAs • Ogilvie et al. (2013) identified 408 metagenomic fragments with phage like properties (4mers) Figure adapted and modified from Narayanasamy et al. (unpublished) and Edwards et al. (2015) • Profiles cannot be too sparse (shorter kmers) • K=3-8 predicted 8-17% correct hosts • Codon usage predicted ~10% hosts correctly • GC content not informative
  28. 28. Summary and overview 28 Signal category Approach Performance Comments Abundance profiles Phage-host coabundance profiles Association by correlation 9.5% non-linear dynamics confound correlations Genetic homology Phage-host nucleotide and protein sequence homology 38.5% - blastn 29.8% - blastx Depends on database CRISPRs Spacers alignments to phage genomes 15.1% - most similar 21.3% - highest Occurrence of CRISPR system (~40% bacteria, ~70% archaea) No matches Not sensitive Exact matches ** Exact matches of phage- host genomes 40.5% Short exact matches may be random Oligonucleotide profiles Similarity of kmer profiles of phage-host 17.2% - 4mer 10.4% - codon Table adapted and modified from Edwards et al. (2015)
  29. 29. Summary and overview 29 • Blastn and exact matches provide strongest signal • Most methods predict between 1 - 4 bacteria as most likely host (better than random) • Significant host genome fraction required (except for abundance-based method) • Current knowledge still limited • Phage host range (highly specific vs brad range) • New methods and technology Figure adapted and modified from Edwards et al. (2015)
  30. 30. Thank you! PHD-2014-1/7934898

×