FAIRSpectra - Enabling the FAIRification of Analytical Science
Making Use of NGS Data: From Reads to Trees and Annotations
1. João André Carriço, PhD
Microbiology Institute/Institute for Molecular Medicine
Faculty of Medicine, University of Lisbon
Portugal
http://im.fm.ul.pt
http://imm.fm.ul.pt
http://www.joaocarrico.info
WORKSHOP 24:
NGS FOR MICROBIAL GENOMIC
SURVEILLANCE AND MORE - ONE
TECHNOLOGY FITS ALL
3. This presentation is not intended to cover all available
software or databases (we would need several weeks or
months to do that)
I’ll present what I use or intend to use in a near future
I gladly accept any suggestions to included on similar
presentations in the future.
It is supposed to be interactive so ask away during the
presentation.
4. What is in the reads FASTQ files
Available Databases
Virulence Factors and AMR DBs
Sequence-based typing databases: Pubmlst.org / Enterobase
HighThroughput Sequencing data analysis (freeware)
Prokka
Roary
Nullabor
Microreact.org
PHYLOViZ
Commercial Solutions
Bionumerics 7.5
CLC GenomicsWorkbench (CLC Bio)
Ridom Seqsphere+
11. To know more :
http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now
Reads
(fastq files)
contigs
(fasta files)
Annotated contigs
(gbk/gff files)
Roary :PanGenome Analysis
Enterobase
BIGSdb
Nullabor
PHYLOViZ:
Tree + metada
visualization
Microreact.org:
Tree +metadata
+vizualization
Prokka
De novo assembler
12. Genome annotation made easy byTorsten
Seemann (slides byTorsten)
Genome annotation: adding biological
information to the sequence, by describing
features
To know more :
http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013
Available at: https://github.com/tseemann/prokka
13. Pan genome analysis by Andrew Page
Available at: https://sangerpathogens.github.io/Roary/
Core
genome
Accessory
genome
Pan-genome
14. Inputs:Annotated de novo assemblies (GFF files)
• Typically from the annotation pipeline
Outputs:
• Spreadsheet with presence and absence of genes
• Multi-FASTA alignment of core genes so you can build a tree without a
reference
• Multi-FASTA alignments for each gene
• Plots for the open/closed genome, unique genes
• Integrates with Phandango so you can visualise all structural variation
• QC report from Kraken to help identify suspect samples
(Slide by Andrew Page)
16. iCANDY output of presence and
absence of genes in accessory
genome.
S. Weltevreden & public S. enterica
genomes
(Slide by Andrew Page)
17. Complete pipeline from reads to reports byTorsten
Seemann
Objective is automate analysis for everyday use on
public health labs /research settings
Uses and distills outputs by a lot of software
Avaliable at: https://github.com/tseemann/nullarbor
24. Available at http://online.phyloviz.net
Web based version of PHYLOViZ
Allows users to create their own datasets, save them and share their data
(privately or publicly)
REST API available
Scalable to thousands of nodes
Tree Analysis tools:
Interactive distance matrix
NLV graph
31. Available at http://microreact.org/
Presentation on session Harnessing whole genome sequence data
for public health applications : Novel open access tools forWGS-
based pathogen surveillance and the identification of high-risk
clones
http://eccmidlive.org/#resources/novel-open-access-tools-for-
wgs-based-pathogen-surveillance-and-the-identification-of-high-
risk-clones
34. • Huge variety of software and database solutions
• There is no single One-Size-Fits-All solution (job
security for bioinformaticians)
• Different questions require different approaches
• Always question the results and data provenance
35. ECCMID2015 Meet-the-expert session on “What bioinformatic tools
should I use for analysis of HighThroughput Sequencing data for
molecular diagnostics? ”
Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015-
meettheexpert-bioinformatics-tools
João André Carriço:
http://www.slideshare.net/joaoandrecarrico/eccmid-meet-
theexpert2015
36. UMMI Members
Bruno Gonçalves
Mário Ramirez
José Melo-Cristino
INESC-ID
Alexandre Francisco
Cátia Vaz
Marta Nascimento
EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)
Mirko Rossi
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):
Dag Harmsen (Univ. Muenster)
Stefan Niemann (Research Center Borstel)
Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)
Joerg Rothganger (RIDOM)
Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca)
Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar(NLM , PHAC)
Ed Taboada and Peter Kruczkiewicz (LabFoodborne Zoonoses, PHAC)
Fiona Brinkman (SFU)
William Hsiao (BCCDC)
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS