Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making Use of NGS Data: From Reads to Trees and Annotations

1,126 views

Published on

Talk at ASM Microbe Boston 2016 - WS 24 Workshop - NGS for Microbial Genomics Surveillance and more - One Technology Fits All - 8.15- 4:15 PM

Published in: Science
  • Login to see the comments

  • Be the first to like this

Making Use of NGS Data: From Reads to Trees and Annotations

  1. 1. João André Carriço, PhD Microbiology Institute/Institute for Molecular Medicine Faculty of Medicine, University of Lisbon Portugal http://im.fm.ul.pt http://imm.fm.ul.pt http://www.joaocarrico.info WORKSHOP 24: NGS FOR MICROBIAL GENOMIC SURVEILLANCE AND MORE - ONE TECHNOLOGY FITS ALL
  2. 2. Nothing to disclose
  3. 3.  This presentation is not intended to cover all available software or databases (we would need several weeks or months to do that)  I’ll present what I use or intend to use in a near future  I gladly accept any suggestions to included on similar presentations in the future.  It is supposed to be interactive so ask away during the presentation.
  4. 4.  What is in the reads FASTQ files  Available Databases  Virulence Factors and AMR DBs  Sequence-based typing databases: Pubmlst.org / Enterobase  HighThroughput Sequencing data analysis (freeware)  Prokka  Roary  Nullabor  Microreact.org  PHYLOViZ  Commercial Solutions  Bionumerics 7.5  CLC GenomicsWorkbench (CLC Bio)  Ridom Seqsphere+
  5. 5. Isolate Genome* Sequenced Reads Slide Source: Nick Loman Other isolates in the sequencing run Contamination * Chromosome + Plasmids + Phages
  6. 6. Virulence Factor Databases  VFDB (http://www.mgc.ac.cn/VFs/main.htm)  Pathosystems Resource Integration Center (PATRIC) VF (https)://www.patricbrc.org/)  Victors (http://www.phidias.us/victors/)  PHI-Base (http://www.phi-base.org/)  MvirDB (http://mvirdb.llnl.gov/ ) To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases
  7. 7.  Comprehensive Antibiotic Resistance Database (CARD) (https://card.mcmaster.ca/)  Repository of Antibiotic resistanceCassetes (RAC)(http://rac.aihi.mq.edu.au/rac/)  Integrall :The integron database (http://integrall.bio.ua.pt/) (…)
  8. 8. http://www.pubmlst.org http://bigsdb.web.pasteur.fr/
  9. 9. slide by @happy_khan Martin Sergeant Mark Achtman Nabil-Fareed Alikhan Zhemin Zhou
  10. 10. To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now Reads (fastq files) contigs (fasta files) Annotated contigs (gbk/gff files) Roary :PanGenome Analysis Enterobase BIGSdb Nullabor PHYLOViZ: Tree + metada visualization Microreact.org: Tree +metadata +vizualization Prokka De novo assembler
  11. 11.  Genome annotation made easy byTorsten Seemann (slides byTorsten)  Genome annotation: adding biological information to the sequence, by describing features To know more : http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013 Available at: https://github.com/tseemann/prokka
  12. 12.  Pan genome analysis by Andrew Page  Available at: https://sangerpathogens.github.io/Roary/ Core genome Accessory genome Pan-genome
  13. 13.  Inputs:Annotated de novo assemblies (GFF files) • Typically from the annotation pipeline  Outputs: • Spreadsheet with presence and absence of genes • Multi-FASTA alignment of core genes so you can build a tree without a reference • Multi-FASTA alignments for each gene • Plots for the open/closed genome, unique genes • Integrates with Phandango so you can visualise all structural variation • QC report from Kraken to help identify suspect samples (Slide by Andrew Page)
  14. 14. Core (n or n-1 strains) Soft-Core (n-2 or n-3 strains) Shell ( 8(?) to n-3 strains) Cloud ( <8 (?) strains) Core genome: Core + Soft-Core Accessory genome: Shell + Cloud
  15. 15. iCANDY output of presence and absence of genes in accessory genome. S. Weltevreden & public S. enterica genomes (Slide by Andrew Page)
  16. 16.  Complete pipeline from reads to reports byTorsten Seemann  Objective is automate analysis for everyday use on public health labs /research settings  Uses and distills outputs by a lot of software  Avaliable at: https://github.com/tseemann/nullarbor
  17. 17. Slide byTorsten Seeman
  18. 18. From: https://github.com/tseemann/nullarbor
  19. 19. Slides byTorsten Seeman
  20. 20. www.phyloviz.net
  21. 21. Inputs: - Tab separated txt (profiles) - Fasta files - Automatic database retrieval (MLST) Outputs: • goeBURST and goeBURST MST • Link quality assessment • High quality images Can be easily applied to: - MLST/ cgMLST/wgMLST - MLVA - SNP data* - Gene Presence/absence
  22. 22. New features: • Hierarchical clustering • Neighbor-Joining • Project Saving
  23. 23.  Available at http://online.phyloviz.net  Web based version of PHYLOViZ  Allows users to create their own datasets, save them and share their data (privately or publicly)  REST API available  Scalable to thousands of nodes  Tree Analysis tools:  Interactive distance matrix  NLV graph
  24. 24. Slide by @happy_khan
  25. 25. NLV Graph Tree cut-off Full MST
  26. 26. Create Selections Change tree options
  27. 27.  Available at http://microreact.org/  Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools forWGS- based pathogen surveillance and the identification of high-risk clones  http://eccmidlive.org/#resources/novel-open-access-tools-for- wgs-based-pathogen-surveillance-and-the-identification-of-high- risk-clones
  28. 28. • Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics • CLCBioGenomicWorkbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/
  29. 29. • Huge variety of software and database solutions • There is no single One-Size-Fits-All solution (job security for bioinformaticians) • Different questions require different approaches • Always question the results and data provenance
  30. 30.  ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of HighThroughput Sequencing data for molecular diagnostics? ”  Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015- meettheexpert-bioinformatics-tools  João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet- theexpert2015
  31. 31.  UMMI Members  Bruno Gonçalves  Mário Ramirez  José Melo-Cristino  INESC-ID  Alexandre Francisco  Cátia Vaz  Marta Nascimento  EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)  Mirko Rossi  FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):  Dag Harmsen (Univ. Muenster)  Stefan Niemann (Research Center Borstel)  Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)  Joerg Rothganger (RIDOM)  Hannes Pouseele (Applied Maths)  Genome Canada IRIDA project (www.irida.ca)  Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar(NLM , PHAC)  Ed Taboada and Peter Kruczkiewicz (LabFoodborne Zoonoses, PHAC)  Fiona Brinkman (SFU)  William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

×