Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Julie Shay CCBC poster may 11 2016


Published on

The utility of draft bacterial genomes for gene function analysis and genomic island prediction

Published in: Science
  • Be the first to comment

  • Be the first to like this

Julie Shay CCBC poster may 11 2016

  1. 1. The utility of draft bacterial genomes for gene function analysis and genomic island prediction Julie A. Shay, Claire Bertelli, Bhavjinder K. Dhillon, and Fiona S.L. Brinkman Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada Canada’s Federal Genomics Research and Development Initiative Acknowledgments This work was made possible by funding from Genome Canada, Genome British Columbia, and the GRDI. Funding for project personnel was also provided by Cystic Fibrosis Canada, the Swiss National Science Foundation, the CIHR/MSFHR Bioinformatics Training Program, and the Michael Smith Foundation for Health Research. References Comparing draft vs. complete genomes: two examples The problem: growing gap between draft and complete genomes Genomic Island (GI) analysis •Draft/complete genomes were run on IslandViewer5: web-based GI prediction tool which incorporates two methods: Contigs (GenBank format) Contig alignment to reference genome with Mauve3 Concatenate contigs based on alignment Normal IslandViewer analysis pipeline User-selected reference genome Isolate Draft Genome Complete Genome Compare Gene function category analysis •Open reading frames (ORFs) were assigned to clusters of orthologous groups (COGs)12 using RPS-BLAST13 •COG superfamily distributions were compared between complete genomes and missing regions of drafts Genes of interest •Main data set: 36 Listeria monocytogenes isolates1, draft Illumina genomes and the identical subsequently completed genomes •Other data set: Draft genomes from the Pseudomonas aeruginosa reference panel2 and similar completed genomes (identical reference available for 2 strains) •Draft genome aligned to completed reference with Mauve Contig Mover3 SIGI-HMM10 IslandPath-DIMOB11 •codon usage bias •HMM-based method •dinucleotide bias •presence of a mobility gene •“Replication, recombination, and repair” superfamily was significantly underrepresented in draft genomes of both L. monocytogenes and P. aeruginosa •In particular, transposons tend to be missing from draft genomes Pipeline Many GIs are present at contig breaks, and these GIs are more likely to be missed by analysis of draft genomes 0 20 40 60 80 100 120 140 160 180 0 1 to 9 10 to 99 100 to 999 1000 to 9999 10000 to 99999 1000000 to 999999 NumberofGIPredictions inListeriaGenomes Distance in Base Pairs from Contig Edge Predictions Missed in Draft Genome Analysis Predictions Correctly Identified in Draft Genome Analysis 0 50 100 150 200 250 2008 2009 2010 2011 2012 2013 2014 2015 ThousandsinDatabase Year NCBI SRA Bacterial Genomes NCBI Complete Bacterial Genomes [A] RNA processing and modification [B] Chromatin structure and dynamics [C] Energy production and conversion [D] Cell cycle control, cell division, chromosome partitioning [E] Amino acid transport and metabolism [F] Nucleotide transport and metabolism [G] Carbohydrate transport and metabolism [H] Coenzyme transport and metabolism [I] Lipid transport and metabolism [J] Translation, ribosomal structure and biogenesis [K] Transcription [L] Replication, recombination and repair [M] Cell wall / membrane / envelope biogenesis [N] Cell motility [O] Posttranslational modification, protein turnover, chaperones [P] Inorganic ion transport and metabolism [Q] Secondary metabolites biosynthesis, transport and catabolism [R] General function prediction only [S] Function unknown [T] Signal transduction mechanisms [U] Intracellular trafficking, secretion, and vesicular transport [V] Defense mechanisms [W] Extracellular structures [Z] Cytoskeleton Methods Results AMRGenes Identified using Resistance Gene Identifier4 using the Comprehensive Antibiotic Resistance Database Not significantly underrepresented in Listeria or Pseudomonas draft genomes VirulenceFactors Predicted using a conservative reciprocal- best-blast-hit approach from VFDB, PATRIC, and Victor’s virulence factors5,6,7. Not significantly underrepresented in Listeria or Pseudomonas draft genomes tRNAGenes Predicted using tRNAscan-SE8 and ARAGORN9 Significantly underrepresented in Listeria and Pseudomonas draft genomes PercentMissingfromDraftListeriagenomes 0 0.1 0.2 0.3 0.4 0.5 0.6 A B C D E F G H I J K L M N O P Q R S T U V W Y Z Proportionof TotalORFs COG Superfamily Completed Genome Regions Missing from Draft Genome Conclusion All Protein- Coding Genes AMR Genes Virulence Factors tRNA Genes Note: This image only shows genomes submitted to NCBI, so it is underestimating the extent of the gap between draft and complete 1) Gimour MW, et al. 2010 BMC Genomics 11:120. 2) De Soyza A, et al. 2013 MicrobiologyOpen 2(6): 1010-23. 3) Darling AE et al. 2010 PLoS One 5(6):e11147. 4) McArthur AG, et al. 2013 Antimicrob Agents Chemo 57(7): 3348-57. 5) Dhillon BK, et al. 2015 NAR gkv401. 6) Chen L, et al. 2011 NAR gkr989. 7) Wattam AR, et al. 2014 NAR 42(D1):D581-9. 8) Lowe TM & Eddy SR 1997 NAR 25(5):955-64. 9) Laslett D & Canback B 2004 NAR 32(1):11-6. 10)Waack S, et al. 2006 BMC Bioinformatics7:142. 11)Hsiao W, et al. 2003 Bioinformatics 19(3):418-20. 12)Tatusov RL, et al. 2003 BMC Bioinformatics 4:1. 13)Altschul SF, et al. 1997 NAR 25(17):3389-402. •Draft genomes have limitations: certain gene types, particularly those associated with mobile elements, are disproportionately missing •Draft genome analysis is still valuable for VFs/AMR for the species examined, but more species should be studied