Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity - Presentation Transcript
Computational prediction and characterization of genomic islands: insights into bacterial pathogenicity Morgan G.I. Langille Department of Molecular Biology & Biochemistry Simon Fraser University http://tinyurl.com/genomic-islands
Genomic Island History
Early 1990’s clusters of virulence genes were found in E. coli (Hacker, et al.,1990)
Pathogenicity Islands (PAIs)
Clusters of genes that are associated with bacterial virulence
Genomic Islands (GIs) (Hacker, et al., 2000)
Segments of a genome that are thought to have originated from a horizontal transfer event
Genomic Island Interest
Pathogenicity Islands
Adhesins
Fimbriae, intimin, etc.
Secretion Systems
Type III and Type IV
Toxins
Hemolysins, Pertussis toxin
Invasins, Modulins, and Effectors
Antibiotic Resistance Islands
Metabolic Islands
Genomic Island Interest
Methods for Predicting GIs
Sequence based
Abnormal sequence composition
GC% bias, dinucleotide bias, codon bias, etc
Genomic features associated with mobile genetic elements
Direct repeats, IS elements, presence of tRNA and mobility genes (Integrases, transposases, etc.)
Methods of Predicting GIs
Comparative genomics based
Identify genomic regions with anomalous phylogenetic patterns
Requires multiple genomes
Previous state of GI identification
Sequence based methods
Numerous methods and constant improving of algorithm design
Not very user friendly and accuracy of various methods not well described
Comparative based methods
Used by many researchers, but with no established method (only in-house scripts)
Limited access to user friendly tools for this type of analysis
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
Mauve-whole genome aligner
Allows genome arrangements and inversions
Fast – Aligns two genomes < 15 minutes
Command line accessible
http://gel.ahabs.wisc.edu/mauve/
(Darling, et al., 2004)
IslandPick: Outline Query Genome A Genome B Genome C Genome D Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions
Selecting Comparative Genomes Run Mauve Mauve (A & B) Extract unique regions Mauve (A & C) Mauve (A & D) Genome D Putative Genomic Islands BLAST Identify overlapping unique regions Genome B Genome C Genome D Comparative Genome Selection (using CVTree distances) Query Genome A
What genomes to use?
We want to compare the query genome to other comparative genomes within certain evolutionary distances
Need a phylogenetic tree or a distance matrix for all sequenced bacteria species
CVTree
Uses matching K-strings between the proteomes of two organisms
Constructs phylogenetic trees without alignment
Avoids choosing genes for phylogenetic reconstruction
Web Server http://cvtree.cbi.pku.edu.cn
Downloadable command line executable
(Qi, et al., 2004)
Example: Pseudomonas Tree
Tree built using conserved genes, Omp85 & CarB, and maximum parsimony
CVTree distances from P.syringae B728a are shown
0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1
Determining Distance Cutoffs
Given the distances between any two species, how do we choose comparison genomes?
Maximum Distance Cutoff
Eliminates the use of genomes that have diverged too much (noise)
Minimum Distance Cutoff
Eliminates the use of genomes that have not diverged enough (very closely related strains)
Minimum Number of Genomes
Eliminates the use of too few comparative genomes
Example: Pseudomonas Tree Maximum Distance Cutoff = 0.42 Minimum Number of Genomes = 3 0.227 0.256 0.397 0.393 0.411 0.428 0.430 0 0.481 P. fluorescens Pf-5 P. putida KT2440 P. fluorescens PfO-1 P. syringae tomato DC3000 P. syringae phaseolicola 1448A P. syringae syringae B728a P. aeruginosa PAO1 P. aeruginosa PA14 Acinetobacter ADP1 Minimum Distance Cutoff = 0.10
Predicting Similar Aged GIs GI Insertion Query Genome 1 genome < distance X Query Genome GI Insertion
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
Accuracy of GI methods
Sequence based GI prediction methods
Only require a single genome
Can easily make false predictions
Highly expressed genes
May miss predictions
Amelioration of DNA to host genome
Source genome has same composition as host genome
Usually evaluate accuracy using simulated horizontal gene transfer events or small datasets of verified GIs
IslandPick is independent of sequence composition methods
generated a “positive” dataset of islands
Developing a Negative Dataset
To identify false positives we need a “negative” dataset that does not contain GIs
Identify regions that are conserved across several genomes using Mauve whole genome alignment
Use the same genomes as selected by IslandPick with one additional cutoff
Negative Dataset Query Genome 1 genome > distance X GI Insertion Query Genome GI Insertion
IslandPick Cutoffs
118 chromosomes
771 GIs
~100 genes/strain
173 chromosomes 736 chromosomes (Langille, et al., 2008)
LES Genomic Islands (Winstanley, Langille, et al., 2008)
LES in-vivo competitive index
Mutants grown for 7 days in rat lung with the wild type LES
A CI of less than 1 indicates attenuation of virulence
4 genes within prophage and GIs had strong impact on competitiveness
(Winstanley, Langille, 2008)
Outline
IslandPick: A comparative genomics approach for genomic island identification
Evaluating sequence composition based genomic island prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
The role of genomic islands in the virulent Pseudomonas aeruginosa Liverpool Epidemic Strain
CRISPRs and their association with genomic islands
Overview of CRISPRs
CRISPRs: C lustered r egularly i nterspaced s hort p alindromic r epeats
Able to provide phage resistance and block conjugation
Thought to be similar to RNAi, except DNA (instead of RNA) is thought to be the target
CRISPRs and HGT
Previous studies have shown some evidence of HGT of CRISPRs
Phylogenetic profiles of CAS genes (Haft, et al., 2005)
CRISPRs within 10 megaplasmids (Godde, et al., 2006)
CRISPRs within two prophage in Clostridium difficile (Sebaihia, et al., 2006)
Analysis of CRISPRs and GIs had not been conducted previously
CRISPRs within GIs
CRISPRs predictions were obtained from CRISPRdb, http://crispr.u-psud.fr/crispr/CRISPRHomePage.php
GI predictions were taken from the union of IslandPick, IslandPath-DIMOB, and SIGI-HMM
Number of CRISPRs inside and outside GIs were compared
CRISPRs are over-represented in GIs Domain of Life Number of Genomes Number of GIs Proportion of Genome in GIs Total Number of CRISPRs Expected CRISPRs in GIs Observed CRISPRs in GIs Significance (Chi-square Test)* Archaea 49 298 3.7% 206 7.7 14 0.020 Bacteria 306 4874 6.4% 837 53.3 114 8.1x 10 -18 Archaea & Bacteria 355 5172 6.1% 1043 64.0 128 1.6x 10 -16
Phage genes within GIs
Many GIs are known to contain phage genes
What proportion of GI genes have links to phage?
Identified genes with “phage” in their annotation within GIs
35% of all ‘phage genes’ are within GIs (6% expected)
Phage genes are over-represented in GIs Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 Inside GIs 1 6990 1264.22 165784 ~0 Outside GIs 1 12868 18593.78 2438303
Archaea and CRISPRs Prevalence of CRISPRs in Archaea genomes could result in reduced phage genes Archaea Bacteria Genomes containing a CRISPR 90% 40% Proportion of phage genes 0.10% 0.79% Proportion of GIs with a phage gene 5.1% 17.6%
GIs with CRISPRs and phage genes
Is there evidence supporting that some CRISPRs are being transferred by phage?
GIs containing CRISPR(s) also contain an over-representation of phage genes -> suggesting that some CRISPRs are transferred by phage Genomic Regions Number of ‘phage genes’ Total number of genes in region Chi- Square Test Observed Expected 3 GIs containing CRISPR(s) 2 13 4.5 1500 5.7 x 10 -5 Outside GIs 2 812 820.5 274073
CRISPR conclusions
CRISPR over-representation in GIs suggest that they are being horizontally transferred
Some GIs that contain CRISPRs may have phage origins
CRISPRs in Archaea could be limiting HGT by increasing resistance to phage
Conclusions
Several advances in GI computational prediction
IslandPick, a novel automated comparative genomics based GI prediction program
Analysis of the accuracy of several sequenced based GI prediction methods
IslandViewer: An integrated interface for computational identification and visualization of genomic islands
Insights into GI evolution and their pathogenicity
P. aeruginosa LES – evidence that genomic islands and prophage regions contain genes that provide a competitive advantage for infection in a chronic rat infection model.
CRISPRs and their association with genomic islands
Acknowledgements Supervisor Dr. Fiona Brinkman Supervisor Committee Dr. Baillie Dr. Pio P. aeruginosa LES Craig Winstanley Roger Levesque Bob Hancock Nick Thomson
0 comments
Post a comment