Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computational prediction and characterization of genomic islands:  insights into bacterial pathogenicity Morgan G.I. Langi...
Genomic Island History <ul><li>Early 1990’s clusters of virulence genes were found in  E. coli  (Hacker, et al.,1990) </li...
Genomic Island Interest <ul><li>Pathogenicity Islands </li></ul><ul><ul><li>Adhesins </li></ul></ul><ul><ul><ul><li>Fimbri...
Genomic Island Interest
Methods for Predicting GIs <ul><li>Sequence based  </li></ul><ul><ul><li>Abnormal sequence composition </li></ul></ul><ul>...
Methods of Predicting GIs <ul><li>Comparative genomics based </li></ul><ul><ul><li>Identify genomic regions with anomalous...
Previous state of GI identification <ul><li>Sequence based methods </li></ul><ul><ul><li>Numerous methods and constant imp...
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
Mauve-whole genome aligner <ul><li>Allows genome arrangements and inversions </li></ul><ul><li>Fast – Aligns two genomes <...
IslandPick: Outline Query Genome A Genome B Genome C Genome D Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C)...
Selecting Comparative Genomes Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative...
What genomes to use?  <ul><li>We want to compare the  query genome  to other  comparative genomes  within certain  evoluti...
CVTree <ul><li>Uses matching K-strings between the proteomes of two organisms </li></ul><ul><li>Constructs phylogenetic tr...
Example:  Pseudomonas  Tree <ul><li>Tree built using conserved genes, Omp85 & CarB, and maximum parsimony </li></ul><ul><l...
Determining Distance Cutoffs <ul><li>Given the distances between any two species, how do we choose comparison genomes? </l...
Example:  Pseudomonas  Tree Maximum Distance Cutoff = 0.42 Minimum Number of Genomes = 3 0.227 0.256 0.397 0.393 0.411 0.4...
Predicting Similar Aged GIs GI Insertion Query Genome 1 genome < distance X Query Genome GI Insertion
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
Accuracy of GI methods <ul><li>Sequence based GI prediction methods </li></ul><ul><ul><li>Only require a single genome </l...
Developing a Negative Dataset <ul><li>To identify false positives we need a “negative” dataset that does not contain GIs <...
Negative Dataset  Query Genome 1 genome > distance X GI Insertion Query Genome GI Insertion
IslandPick Cutoffs
<ul><li>118 chromosomes  </li></ul><ul><li>771 GIs </li></ul><ul><li>~100 genes/strain  </li></ul>173 chromosomes 736 chro...
GI Prediction Accuracy Positive Dataset Negative Dataset Predicted Dataset Entire Genome TP FP FN Precision = TP / (TP + F...
GI Prediction Accuracy (Langille, et al.,2008) Tool Average number of nucleotides in GIs per genome (kb) Precision Recall ...
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
IslandViewer  (Langille, et al., 2009) <ul><li>Website that integrates the most accurate GI prediction programs SIGI-HMM, ...
IslandPick – Manual genome selection
User Genome Submission
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
P seudomonas  aeruginosa Liverpool Epidemic Strain (LES) <ul><li>Highly successful at colonizing cystic fibrosis (CF) pati...
LES Analysis <ul><li>Genome sequenced  by Sanger Centre </li></ul><ul><li>I led annotation of the genome and analysis of G...
Signature-tagged mutagenesis (STM) <ul><li>STM is a method to identify genes associated with pathogenesis </li></ul><ul><l...
LES Prophage (Winstanley, Langille, et al., 2008)
LES Genomic Islands (Winstanley, Langille, et al., 2008)
LES in-vivo competitive index <ul><li>Mutants grown for 7 days in rat lung with the wild type LES </li></ul><ul><li>A CI o...
Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating...
Overview of CRISPRs <ul><li>CRISPRs:  C lustered  r egularly  i nterspaced  s hort  p alindromic  r epeats </li></ul><ul><...
CRISPRs and HGT <ul><li>Previous studies have shown some evidence of HGT of CRISPRs </li></ul><ul><ul><li>Phylogenetic pro...
CRISPRs within GIs <ul><li>CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage....
Phage genes within GIs <ul><li>Many GIs are known to contain phage genes </li></ul><ul><li>What proportion of GI genes hav...
Archaea and CRISPRs Prevalence of CRISPRs in Archaea genomes could result in reduced phage genes Archaea Bacteria Genomes ...
GIs with CRISPRs and phage genes <ul><li>Is there evidence supporting that some CRISPRs are being transferred by phage? </...
CRISPR conclusions <ul><li>CRISPR over-representation in GIs suggest that they are being horizontally transferred </li></u...
Conclusions <ul><li>Several advances in GI computational prediction </li></ul><ul><ul><li>IslandPick, a novel automated co...
Acknowledgements Supervisor Dr. Fiona Brinkman Supervisor Committee Dr. Baillie Dr. Pio P. aeruginosa  LES Craig Winstanle...
Upcoming SlideShare
Loading in …5
×

Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

3,497 views

Published on

Published in: Education, Technology
  • Be the first to comment

Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity

  1. 1. Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity Morgan G.I. Langille Department of Molecular Biology & Biochemistry Simon Fraser University http://tinyurl.com/genomic-islands
  2. 2. Genomic Island History <ul><li>Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990) </li></ul><ul><li>Pathogenicity Islands (PAIs) </li></ul><ul><ul><li>Clusters of genes that are associated with bacterial virulence </li></ul></ul><ul><li>Genomic Islands (GIs) (Hacker, et al., 2000) </li></ul><ul><ul><li>Segments of a genome that are thought to have originated from a horizontal transfer event </li></ul></ul>
  3. 3. Genomic Island Interest <ul><li>Pathogenicity Islands </li></ul><ul><ul><li>Adhesins </li></ul></ul><ul><ul><ul><li>Fimbriae, intimin, etc. </li></ul></ul></ul><ul><ul><li>Secretion Systems </li></ul></ul><ul><ul><ul><li>Type III and Type IV </li></ul></ul></ul><ul><ul><li>Toxins </li></ul></ul><ul><ul><ul><li>Hemolysins, Pertussis toxin </li></ul></ul></ul><ul><ul><li>Invasins, Modulins, and Effectors </li></ul></ul><ul><li>Antibiotic Resistance Islands </li></ul><ul><li>Metabolic Islands </li></ul>
  4. 4.
  5. 5. Genomic Island Interest
  6. 6. Methods for Predicting GIs <ul><li>Sequence based </li></ul><ul><ul><li>Abnormal sequence composition </li></ul></ul><ul><ul><ul><li>GC% bias, dinucleotide bias, codon bias, etc </li></ul></ul></ul><ul><ul><li>Genomic features associated with mobile genetic elements </li></ul></ul><ul><ul><ul><li>Direct repeats, IS elements, presence of tRNA and mobility genes (Integrases, transposases, etc.) </li></ul></ul></ul>
  7. 7. Methods of Predicting GIs <ul><li>Comparative genomics based </li></ul><ul><ul><li>Identify genomic regions with anomalous phylogenetic patterns </li></ul></ul><ul><ul><li>Requires multiple genomes </li></ul></ul>
  8. 8.
  9. 9. Previous state of GI identification <ul><li>Sequence based methods </li></ul><ul><ul><li>Numerous methods and constant improving of algorithm design </li></ul></ul><ul><ul><li>Not very user friendly and accuracy of various methods not well described </li></ul></ul><ul><li>Comparative based methods </li></ul><ul><ul><li>Used by many researchers, but with no established method (only in-house scripts) </li></ul></ul><ul><ul><li>Limited access to user friendly tools for this type of analysis </li></ul></ul>
  10. 10. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  11. 11. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  12. 12. Mauve-whole genome aligner <ul><li>Allows genome arrangements and inversions </li></ul><ul><li>Fast – Aligns two genomes < 15 minutes </li></ul><ul><li>Command line accessible </li></ul><ul><li>http://gel.ahabs.wisc.edu/mauve/ </li></ul>(Darling, et al., 2004)
  13. 13. IslandPick: Outline Query Genome A Genome B Genome C Genome D Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions
  14. 14. Selecting Comparative Genomes Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions Genome B Genome C Genome D Comparative Genome Selection (using CVTree distances) Query Genome A
  15. 15. What genomes to use? <ul><li>We want to compare the query genome to other comparative genomes within certain evolutionary distances </li></ul><ul><li>Need a phylogenetic tree or a distance matrix for all sequenced bacteria species </li></ul>
  16. 16. CVTree <ul><li>Uses matching K-strings between the proteomes of two organisms </li></ul><ul><li>Constructs phylogenetic trees without alignment </li></ul><ul><li>Avoids choosing genes for phylogenetic reconstruction </li></ul><ul><li>Web Server http://cvtree.cbi.pku.edu.cn </li></ul><ul><li>Downloadable command line executable </li></ul>(Qi, et al., 2004)
  17. 17. Example: Pseudomonas Tree <ul><li>Tree built using conserved genes, Omp85 & CarB, and maximum parsimony </li></ul><ul><li>CVTree distances from P.syringae B728a are shown </li></ul>0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1
  18. 18. Determining Distance Cutoffs <ul><li>Given the distances between any two species, how do we choose comparison genomes? </li></ul><ul><ul><li>Maximum Distance Cutoff </li></ul></ul><ul><ul><ul><li>Eliminates the use of genomes that have diverged too much (noise) </li></ul></ul></ul><ul><ul><li>Minimum Distance Cutoff </li></ul></ul><ul><ul><ul><li>Eliminates the use of genomes that have not diverged enough (very closely related strains) </li></ul></ul></ul><ul><ul><li>Minimum Number of Genomes </li></ul></ul><ul><ul><ul><li>Eliminates the use of too few comparative genomes </li></ul></ul></ul>
  19. 19. Example: Pseudomonas Tree Maximum Distance Cutoff = 0.42 Minimum Number of Genomes = 3 0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1 Minimum Distance Cutoff = 0.10
  20. 20. Predicting Similar Aged GIs GI Insertion Query Genome 1 genome < distance X Query Genome GI Insertion
  21. 21. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  22. 22. Accuracy of GI methods <ul><li>Sequence based GI prediction methods </li></ul><ul><ul><li>Only require a single genome </li></ul></ul><ul><ul><li>Can easily make false predictions </li></ul></ul><ul><ul><ul><li>Highly expressed genes </li></ul></ul></ul><ul><ul><li>May miss predictions </li></ul></ul><ul><ul><ul><li>Amelioration of DNA to host genome </li></ul></ul></ul><ul><ul><ul><li>Source genome has same composition as host genome </li></ul></ul></ul><ul><li>Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs </li></ul><ul><li>IslandPick is independent of sequence composition methods </li></ul><ul><ul><li>generated a “positive” dataset of islands </li></ul></ul>
  23. 23. Developing a Negative Dataset <ul><li>To identify false positives we need a “negative” dataset that does not contain GIs </li></ul><ul><li>Identify regions that are conserved across several genomes using Mauve whole genome alignment </li></ul><ul><li>Use the same genomes as selected by IslandPick with one additional cutoff </li></ul>
  24. 24. Negative Dataset Query Genome 1 genome > distance X GI Insertion Query Genome GI Insertion
  25. 25. IslandPick Cutoffs
  26. 26. <ul><li>118 chromosomes </li></ul><ul><li>771 GIs </li></ul><ul><li>~100 genes/strain </li></ul>173 chromosomes 736 chromosomes (Langille, et al., 2008)
  27. 27. GI Prediction Accuracy Positive Dataset Negative Dataset Predicted Dataset Entire Genome TP FP FN Precision = TP / (TP + FP) Recall = TP / (TP + FN) TN
  28. 28. GI Prediction Accuracy (Langille, et al.,2008) Tool Average number of nucleotides in GIs per genome (kb) Precision Recall Overall Accuracy SIGI-HMM 233 92 33.0 86 IslandPath/ Dimob 171 86 36 86 PAI IDA 163 68 32 84 Centroid 171 61 28 82 IslandPath/ Dinuc 444 55 53 82 Alien Hunter 1265 38 77 71 Literature* 639 100 87 96
  29. 29. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  30. 30. IslandViewer (Langille, et al., 2009) <ul><li>Website that integrates the most accurate GI prediction programs SIGI-HMM, IslandPath-DIMOB, and IslandPick </li></ul><ul><li>Genomic island prediction pre-calculated for all genomes </li></ul><ul><ul><li>Automatically updated monthly </li></ul></ul><ul><li>User genome submission available </li></ul><ul><li>IslandPick can be run using manually selected comparison genomes </li></ul><ul><li>Download data for a genomic island, a chromosome, or entire dataset </li></ul><ul><li>http://www.pathogenomics.sfu.ca/islandviewer/ </li></ul>
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35. IslandPick – Manual genome selection
  36. 36. User Genome Submission
  37. 37. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  38. 38. P seudomonas aeruginosa Liverpool Epidemic Strain (LES) <ul><li>Highly successful at colonizing cystic fibrosis (CF) patients </li></ul><ul><li>Has replaced previously established strains </li></ul><ul><li>Caused infections of non-CF patients </li></ul><ul><li>Can cause greater morbidity in CF than other strains of P. aeruginosa </li></ul><ul><li>( Salunkhe, et al., 2005) </li></ul>
  39. 39. LES Analysis <ul><li>Genome sequenced by Sanger Centre </li></ul><ul><li>I led annotation of the genome and analysis of GIs </li></ul><ul><li>6 Prophages </li></ul><ul><li>5 Genomic Islands </li></ul>(Winstanley, Langille, et al., 2008)
  40. 40. Signature-tagged mutagenesis (STM) <ul><li>STM is a method to identify genes associated with pathogenesis </li></ul><ul><li>LES used in a chronic rat lung infection model </li></ul><ul><li>47 genes identified by STM </li></ul><ul><li>5 of these genes are within GIs and prophage regions </li></ul>http://www.traill.uiuc.edu/uploads/porknet/papers/LitchtensteigerPaper.pdf
  41. 41. LES Prophage (Winstanley, Langille, et al., 2008)
  42. 42. LES Genomic Islands (Winstanley, Langille, et al., 2008)
  43. 43. LES in-vivo competitive index <ul><li>Mutants grown for 7 days in rat lung with the wild type LES </li></ul><ul><li>A CI of less than 1 indicates attenuation of virulence </li></ul><ul><li>4 genes within prophage and GIs had strong impact on competitiveness </li></ul>(Winstanley, Langille, 2008)
  44. 44. Outline <ul><li>IslandPick: A comparative genomics approach for genomic island identification </li></ul><ul><li>Evaluating sequence composition based genomic island prediction methods </li></ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul><ul><li>The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain </li></ul><ul><li>CRISPRs and their association with genomic islands </li></ul>
  45. 45. Overview of CRISPRs <ul><li>CRISPRs: C lustered r egularly i nterspaced s hort p alindromic r epeats </li></ul><ul><li>Able to provide phage resistance and block conjugation </li></ul><ul><li>Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target </li></ul>
  46. 46. CRISPRs and HGT <ul><li>Previous studies have shown some evidence of HGT of CRISPRs </li></ul><ul><ul><li>Phylogenetic profiles of CAS genes (Haft, et al., 2005) </li></ul></ul><ul><ul><li>CRISPRs within 10 megaplasmids (Godde, et al., 2006) </li></ul></ul><ul><ul><li>CRISPRs within two prophage in Clostridium difficile (Sebaihia, et al., 2006) </li></ul></ul><ul><li>Analysis of CRISPRs and GIs had not been conducted previously </li></ul>
  47. 47. CRISPRs within GIs <ul><li>CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php </li></ul><ul><li>GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM </li></ul><ul><li>Number of CRISPRs inside and outside GIs were compared </li></ul>CRISPRs are over-represented in GIs Domain of Life Number of Genomes Number of GIs Proportion of Genome in GIs Total Number of CRISPRs Expected CRISPRs in GIs Observed CRISPRs in GIs Significance (Chi-square Test)* Archaea 49 298 3.7% 206 7.7 14 0.020 Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10 -18 Archaea & Bacteria 355 5172 6.1% 1043 64.0 128 1.6x 10 -16
  48. 48. Phage genes within GIs <ul><li>Many GIs are known to contain phage genes </li></ul><ul><li>What proportion of GI genes have links to phage? </li></ul><ul><li>Identified genes with “phage” in their annotation within GIs </li></ul><ul><li>35% of all ‘phage genes’ are within GIs (6% expected) </li></ul>Phage genes are over-represented in GIs Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 Inside GIs 1 6990 1264.22 165784 ~0 Outside GIs 1 12868 18593.78 2438303
  49. 49. Archaea and CRISPRs Prevalence of CRISPRs in Archaea genomes could result in reduced phage genes Archaea Bacteria Genomes containing a CRISPR 90% 40% Proportion of phage genes 0.10% 0.79% Proportion of GIs with a phage gene 5.1% 17.6%
  50. 50. GIs with CRISPRs and phage genes <ul><li>Is there evidence supporting that some CRISPRs are being transferred by phage? </li></ul>GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 GIs containing CRISPR(s) 2 13 4.5 1500 5.7 x 10 -5 Outside GIs 2 812 820.5 274073
  51. 51. CRISPR conclusions <ul><li>CRISPR over-representation in GIs suggest that they are being horizontally transferred </li></ul><ul><li>Some GIs that contain CRISPRs may have phage origins </li></ul><ul><li>CRISPRs in Archaea could be limiting HGT by increasing resistance to phage </li></ul>
  52. 52. Conclusions <ul><li>Several advances in GI computational prediction </li></ul><ul><ul><li>IslandPick, a novel automated comparative genomics based GI prediction program </li></ul></ul><ul><ul><li>Analysis of the accuracy of several sequenced based GI prediction methods </li></ul></ul><ul><ul><li>IslandViewer: An integrated interface for computational identification and visualization of genomic islands </li></ul></ul><ul><li>Insights into GI evolution and their pathogenicity </li></ul><ul><ul><li>P. aeruginosa LES – evidence that genomic islands and prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model. </li></ul></ul><ul><ul><li>CRISPRs and their association with genomic islands </li></ul></ul>
  53. 53. Acknowledgements Supervisor Dr. Fiona Brinkman Supervisor Committee Dr. Baillie Dr. Pio P. aeruginosa LES Craig Winstanley Roger Levesque Bob Hancock Nick Thomson

×