Successfully reported this slideshow.

IRIDA's Genomic epidemiology application ontology (GenEpiO): Genomic, clinical and epidemiological data standardization and integration

1

Share

Upcoming SlideShare
JALANov2000
JALANov2000
Loading in …3
×
1 of 24
1 of 24

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

IRIDA's Genomic epidemiology application ontology (GenEpiO): Genomic, clinical and epidemiological data standardization and integration

  1. 1. IRIDA’s Genomic Epidemiology Application Ontology (GenEpiO): Genomic, Clinical and Epidemiological Data Standardization and Integration Emma Griffiths Brinkman Lab Simon Fraser University, Greater Vancouver, Canada On behalf of the IRIDA Ontology WG (Will Hsiao & Damion Dooley (BC Public Health Lab), Fiona Brinkman (SFU) IMMEM XI, Estoril, Portugal March 11, 2016
  2. 2. Contextual Information is Crucial for Interpreting Genomics Data. Microbial genomics is a high resolution tool for identification. 2
  3. 3. 3 Contextual Information Needs to be Shared….. So Keep the Next User in Mind. International Partners Intervention Partners
  4. 4. 4 The of Contextual Information Isn’t STANDARDIZED
  5. 5. 5 When Words Can Mean Different Things. Semantic Ambiguity.
  6. 6. “Ontologies are for the digital age what dictionaries were in the age of print.” Logic Vocabulary Hierarchy Knowledge Extraction Ontology Ontology, A Way of Structuring Information. • Standardized, well-defined hierarchy terms • interconnected with logical relationships • “knowledge-generation engine” = 6
  7. 7. Ontologies Standardize Vocabulary and Enable Complex Querying. 7 Simple Food Ontology Hierarchy Animal Feed Poultry Water Pellets Nuggets Deli Meats Bottled Well Produce Spinach Sprouts Whole Mice Transmission through_ ingestion or contact Treated by_filtration Taxonomy_Spniacea oleracea Preparation_Ready -to-Eat Animal (Consumer)_ Snake Synonym_Cold Cuts
  8. 8. Case Studies: Ontology Can Help Resolve Issues of Taxonomy, Granularity and Specificity. Leafy Greens Spinach Lettuce EndiveIcebergSpinacia oleracea Amaranthus hybridus Taxonomy_species found in N. America Taxonomy_species found in S. Africa Equivalent Subtypes of Lettuce a) Taxonomy & Granularity Poultry Chicken Nuggets b) Specificity Breast Processing_Ready-to-Eat Composition_breading, spices, chicken breast Location of Purchase_Retail (Grocery Store vs Butcher) Preparation_marinated 8
  9. 9. Ontology Acts Like A Rosetta Stone. • Need a common language • Humans AND computers need to read it • Mapping allows interoperability AND customization *ontologies can be translated into different human languages as wellRosetta Stone – Egypt, 196 BC • stone tablet translating same text into different ancient languages 9
  10. 10. 10 Ontology Offers Faster, More Accurate Data Integration.
  11. 11. 11 The Mission: Developing an Ontology Resource for Genomic Epidemiology in Canada
  12. 12. To Develop a Useful Gen Epi Ontology, Engaging the End Users is Your TOP Priority. 12 Medical & Environmental Microbiologists Bioinformaticians Surveillance Analysts & Lab Personnel Epidemiologists Software and Work Flows Investigation ToolsInstrumentation + = Interview users Examine resources GenEpiO (Genomic Epidemiology Application Ontology)
  13. 13. GenEpiO Combines Different Epi, Lab, Genomics and Clinical Data Fields. Lab Analytics Genomics, PFGE Serotyping, Phage typing MLST, AMR Sample Metadata Isolation Source (Food, Host Body Product, Environmental), BioSample Epidemiology Investigation Exposures Clinical Data Patient demographics, Medical History, Comorbidities, Symptoms, Health Status Reporting Case/Investigation Status 13 GenEpiO (Genomic Epidemiology Application Ontology)
  14. 14. 14 Use computers to identify common exposures, symptoms etc among genomics clusters Example: Automating Case Definition generation Correlate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with High-Risk Food Types Spinach  Leafy Greens and Geographical Location of Vancouver XXXXXXXXXXXXXX GenEpiO Will Help Integrate Genomics and Epidemiological Data in the IRIDA Platform.
  15. 15. 15 Integrated Rapid Infectious Disease Analysis Platform Find out more about IRIDA from Will Hsiao (BC Public Health Lab) on Sat Mar 12 in the Molecular Epidemiology and Public Health session! Website: IRIDA.ca Email: IRIDA-mail@sfu.ca GitHub: https://github.com/phac-nml/irida
  16. 16. GenEpiO has been Implemented in Different IRIDA Interfaces. • Creates BioSample-Compliant Genome Submission Forms. 16 Metadata Manager: Data entry portal • Implements GenEpiO terms • Facilitates descriptive metadata • Secure environment • Selective sharing
  17. 17. IRIDA Offers Line List Visualizations of Selectable Data Based on GenEpiO Fields. 1. Line List View 2. Timeline View Hideable cases Selectable fields Travel Symptoms and Onset Exposure Types Hospitalization
  18. 18. 18 GenEpiO Testing Has Made GenEpiO More Robust. • FWS Datasets
  19. 19. 19 GenEpiO is Standardizing Terms for Reporting and Quality Control. • Reproducibility • Reproducibility • Reproducibility • Reproducibility
  20. 20. A Genomic Epidemiology Ontology has Advantages for Public Health. Improved Public Health Investigation power! 1. Eliminates semantic ambiguity 2. Term-mapping allows customization 3. Faster data integration 4. Standardized quality control and result reporting trigger actionable events in same way 5. Reproducibility (accreditation, validation) 20
  21. 21. The Future Ontology Development Will Focus On Three Key Areas. Food Antimicrobial Resistance Epidemiology 21
  22. 22. Genomic Epidemiology Ontology is Like Instrumentation for Your Contextual Information…it Needs Maintenance and Improvements. We’re forming a Genomic Epidemiology Ontology Consortium. Join us! 22
  23. 23. 23 E-mail: IRIDA-mail@sfu.ca https://github.com/Public-Health-Bioinformatics/IRIDA_ontology
  24. 24. Acknowledgements Integrated Rapid Infectious Disease Analysis Project www.IRIDA.ca Primary Investigators Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML Co-Investigators Dr. Rob Beiko - Dalhousie Dr. Eduardo Taboada - LFZ Dr. Morag Graham - NML Dr. Joᾶo Andre Carrico – University of Lisbon National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olsen Tara Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Toboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall Simon Fraser University (SFU) Emma Griffiths Geoff Winsor Julie Shay Bhav Dhillon Claire Bertelli BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Natalie Prystajecky Jennifer Gardy Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Damion Dooley Cletus D’Souza University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Adam Koziol Burton Blais Catherine Carrillo Dalhousie University Alex Keddy 24

Editor's Notes

  • Ontology: a way of organizing information in a hierarchy of well defined terms that are interconnected with logical relationships
    Well defined, reuse terms from different domains, IDs to disambiguate meaning and control for synonyms
    Integrates different data types, extra information layer provides “knowledge-generation engine”
    Taxonomy differences (domesticated vs wild types, between countries eg spinach not the same plant in Africa as North America)
    Relationships between consumers and food consumed
    Relationships specifying food processing, preservation, distribution
    Relationships describing how consumer and pathogen can interact eg transmission routes
    Provides means for automation of routine processes, improved querying
  • Genomic Epidemiology Requires a Lot of Different Types of Contextual Data.
    Conducted interviews to create user profiles (to identify user capabilities, expectations and requirements) and understand information flow
    To define the different users' needs and requirements:
    bioinformatics training and expertise
    types of software they use
    daily activities and duties
    issues and concerns regarding current systems
    requirements for a WGS platform

    PH Users include:
    BC PHMRL
    Epidemiologists
    Environmental Microbiologists
    Medical Microbiologists
    Bioinformaticians

  • “Person, place, time”
    Exposure, food items, geographical information, symptoms, onset of symptoms
    Created (manually in excel) on ad hoc basis per investigation
    Need to be shared between stakeholders, but data governance is an issue
  • The particularity of IRIDA, in addition to being a unique collaboration between different types of collaborators, is to use standards throughout the platform.
  • Much easier and effective to prospectively collect metadata that retrospectively collect it from different lab notebooks, databases, health authorities (have to ask for permission)
    Prompts user to input epidemiologically useful info at point of sample intake/prior to submission (benefitting NEXT user)
    Facilitates use of common language that can be shared

  • Archiving, select cases as case definition changes
  • Create a smaller core (Lab, Epi exposure, and Food) ontology for line-list testing
    Create a consortium for group to take on different domains of Genomic Epidemiology Application Ontology
    Pursuing longer term funding for ontology
  • ×