Talks given at the Session SY024 - Controversies in interpreting whole genome sequence data
9-April-2016 : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases
ECCMID 2016 - How to build actionable virulome databases
1. João André Carriço,
Microbiology Institute and Instituto de Medicina Molecular,
Faculty of Medicine, University of Lisbon
jcarrico@fm.ul.pt twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data
26th ECCMID, Amsterdam, Netherlands
7-12 April 2016
2. João André Carriço,
Microbiology Institute and Instituto de Medicina Molecular,
Faculty of Medicine, University of Lisbon
jcarrico@fm.ul.pt twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data
26th ECCMID, Amsterdam, Netherlands
7-12 April 2016
3. Virulence Factors:
Class of gene products
Help pathogens to invade the host and
evade specific host’s defensive mechanisms
Enhance the pathogen’s potential to cause
disease
4. Virulence Factors (example):
Bacterial toxins (Endotoxins and Exotoxins)
Adherence factors (Pili)
Cell surface carbohydrates and proteins that protect a
bacterium (Streptococcal M Protein)
Hydrolytic enzymes that may contribute to the
pathogenicity of the bacterium (hyaluronidase)
Factors to compete with host nutrient uptake
(Siderophores)
Sources:VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)
12. All the databases have:
manually curated data
links for the original publication
However manual curation is a huge caveat
due to the sustainability of the process
13. Querying annotation in the the website
Selecting species of interest, and browsing
the website
BLAST query for DNA or Protein
14. Download the gene/protein databases and
use them as templates for searching own
data
16. With HTS several core genome /whole genome MLST schemas are becoming available/being
developed:
Neisseria sp.
Campylobacter sp.
Staphylococcus aureus
Legionella pneumophila
Listeria monocitogenes
Enterococcus faecium
Mycobacterium tuberculosis
Acinetobacter baumannii
Salmonella enterica
E.coli
….
Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic
allele annotation through these systems
Seqsphere+
http://pubmlst.org/
http://bigsdb.web.pasteur.fr/
https://enterobase.warwick.ac.uk/
Bionumerics 7.5
17. So far we have seen what is available
How can we design
actionable virulome databases ?
Actionable: able to be done or acted on; having practical value
New Oxford American Dictionary
18. Available databases still lack interfaces for
programmatic access :
RESTful APIs would allow:
▪ easy automatic querying from scripts without the need
of web interfaces or downloads
▪ Database updates by authorized groups (distributed
curation effort)
APIs : Application Programming Interfaces
19. Existing DBs reuse each others datasets without true
database interoperability: need for common ontologies
(controlled vocabularies already exist but are not used by
all)
Ontologies and computer readable data formats (json-
ld or RDF) can allow for true database interoperability
allowing bioinformaticians to extract the targeted
information from a single query reaching multiple
databases
21. Major problems of databases
Manual curation still a necessity
Academic model for sustainability of a resource:
lack of funding leads to “dead” databases
22. Existing virulome databases provide a wealth of data
A large part of the availableVF data overlaps between DBs.
The overlap largely depends of the last database update and
what was included.
They are always aWork in Progress , heavily relying in
manual curation
Novel HTS based techniques such as cg/wgMLST can use
this databases to annotate schemas and provide a much
richer picture ofVF diversity at DNA/Protein level.
23. UMMI Members
Mário Ramirez
José Melo-Cristino
EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)
Mirko Rossi
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):
Dag Harmsen (Univ. Muenster)
Stefan Niemann (Research Center Borstel)
Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)
Joerg Rothganger (RIDOM)
Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca)
Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC)
Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC)
Fiona Brinkman (SFU)
William Hsiao (BCCDC)
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS