IRIDA's Genomic epidemiology application ontology (GenEpiO): Genomic, clinical and epidemiological data standardization and integration
IRIDA’s Genomic Epidemiology Application Ontology
(GenEpiO): Genomic, Clinical and Epidemiological Data
Standardization and Integration
Simon Fraser University, Greater Vancouver, Canada
On behalf of the IRIDA Ontology WG
(Will Hsiao & Damion Dooley (BC Public Health Lab), Fiona Brinkman (SFU)
IMMEM XI, Estoril, Portugal
March 11, 2016
Contextual Information is Crucial for Interpreting Genomics Data.
Microbial genomics is a high
resolution tool for identification.
“Ontologies are for the digital age what dictionaries were in the age of print.”
Ontology, A Way of Structuring Information.
• Standardized, well-defined hierarchy terms
• interconnected with logical relationships
• “knowledge-generation engine”
Ontologies Standardize Vocabulary and Enable Complex Querying.
Simple Food Ontology Hierarchy
Animal Feed Poultry Water
Pellets Nuggets Deli Meats Bottled Well
Spinach Sprouts Whole Mice
Case Studies: Ontology Can Help Resolve Issues of Taxonomy, Granularity and Specificity.
EndiveIcebergSpinacia oleracea Amaranthus hybridus
found in N. America
found in S. Africa Equivalent Subtypes
a) Taxonomy & Granularity
spices, chicken breast
(Grocery Store vs
Ontology Acts Like A Rosetta Stone.
• Need a common language
• Humans AND computers need to read it
• Mapping allows interoperability AND
*ontologies can be translated into different human languages as wellRosetta Stone – Egypt, 196 BC
• stone tablet translating same text
into different ancient languages
To Develop a Useful Gen Epi Ontology, Engaging the End Users is Your
Medical & Environmental
& Lab Personnel
Software and Work Flows
Interview users Examine resources
GenEpiO Combines Different Epi, Lab, Genomics and Clinical Data Fields.
Serotyping, Phage typing
Isolation Source (Food, Host
Patient demographics, Medical
Symptoms, Health Status
Use computers to
etc among genomics
Example: Automating Case Definition generation
Correlate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with
High-Risk Food Types Spinach Leafy Greens and Geographical Location of Vancouver
GenEpiO Will Help Integrate Genomics and Epidemiological Data
in the IRIDA Platform.
Integrated Rapid Infectious Disease Analysis Platform
Find out more about IRIDA from
Will Hsiao (BC Public Health Lab) on
Sat Mar 12 in the Molecular
Epidemiology and Public Health
GenEpiO has been Implemented in Different IRIDA Interfaces.
• Creates BioSample-Compliant Genome Submission Forms. 16
Metadata Manager: Data entry portal
• Implements GenEpiO terms
• Facilitates descriptive metadata
• Secure environment
• Selective sharing
IRIDA Offers Line List Visualizations of Selectable Data Based on GenEpiO Fields.
1. Line List
Symptoms and Onset
GenEpiO is Standardizing Terms for Reporting and Quality Control.
A Genomic Epidemiology Ontology has Advantages for Public Health.
Improved Public Health
1. Eliminates semantic ambiguity
2. Term-mapping allows customization
3. Faster data integration
4. Standardized quality control and result reporting trigger actionable
events in same way
5. Reproducibility (accreditation, validation)
The Future Ontology Development Will Focus On Three Key Areas.
Genomic Epidemiology Ontology is Like Instrumentation for
Your Contextual Information…it Needs Maintenance and
We’re forming a Genomic Epidemiology Ontology Consortium.
Join us! 22
Integrated Rapid Infectious
Disease Analysis Project
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
Dr. Rob Beiko - Dalhousie
Dr. Eduardo Taboada - LFZ
Dr. Morag Graham - NML
Dr. Joᾶo Andre Carrico – University of Lisbon
National Microbiology Laboratory (NML)
Laboratory for Foodborne Zoonoses (LFZ)
Simon Fraser University (SFU)
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
University of Maryland
Canadian Food Inspection Agency (CFIA)
Ontology: a way of organizing information in a hierarchy of well defined terms that are interconnected with logical relationships
Well defined, reuse terms from different domains, IDs to disambiguate meaning and control for synonyms
Integrates different data types, extra information layer provides “knowledge-generation engine”
Taxonomy differences (domesticated vs wild types, between countries eg spinach not the same plant in Africa as North America)
Relationships between consumers and food consumed
Relationships specifying food processing, preservation, distribution
Relationships describing how consumer and pathogen can interact eg transmission routes
Provides means for automation of routine processes, improved querying
Genomic Epidemiology Requires a Lot of Different Types of Contextual Data.
Conducted interviews to create user profiles (to identify user capabilities, expectations and requirements) and understand information flow
To define the different users' needs and requirements:
bioinformatics training and expertise
types of software they use
daily activities and duties
issues and concerns regarding current systems
requirements for a WGS platform
PH Users include:
“Person, place, time”
Exposure, food items, geographical information, symptoms, onset of symptoms
Created (manually in excel) on ad hoc basis per investigation
Need to be shared between stakeholders, but data governance is an issue
The particularity of IRIDA, in addition to being a unique collaboration between different types of collaborators, is to use standards throughout the platform.
Much easier and effective to prospectively collect metadata that retrospectively collect it from different lab notebooks, databases, health authorities (have to ask for permission)
Prompts user to input epidemiologically useful info at point of sample intake/prior to submission (benefitting NEXT user)
Facilitates use of common language that can be shared
Archiving, select cases as case definition changes
Create a smaller core (Lab, Epi exposure, and Food) ontology for line-list testing
Create a consortium for group to take on different domains of Genomic Epidemiology Application Ontology
Pursuing longer term funding for ontology