Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
IRIDA: Canada’s federated platform
for genomic epidemiology
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Public...
Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for
Public Health Micro...
3
Each year, one in eight Canadians (or
four million people)
get sick with a domestically acquired
food-borne illness.
Partnership among public health agencies and academic institutes to bridge the gaps
between advancements in genomic epidem...
Interviews with key personnel to identify
barriers to implement genomic epidemiology in
public health agencies
5
GAP 1: PUBLIC HEALTH PERSONNEL
LACK TRAINING IN GENOMICS
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
• Carefully designed and engin...
• Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality
analysis platform to...
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1b: Build Portable and Transparent
Pipelines
• Use Galaxy as workflow engine – large
community support
• Retools ...
Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted
genomic workshops for partners...
GAP 2: INFORMATION SHARING IS
INEFFICIENT AND AD-HOC
Many Players in surveillance and outbreak –
ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public
hea...
Many Systems used in Reporting Diseases –
require data re-entry and re-coding
National Ministry of
Health
Provincial publi...
IRIDA is designed with these dilemma in mind
• Solutions:
– 2a: Localized Instance of federated databases
– 2b: Permission...
Solution 2a: Local/Cloud Instances and Data
Federation
• Data processing capacity pushed to data generating
labs
• Allow d...
Authorization
Solution 2b: Security
• Local authorization per instance.
• Method-level authorization.
• Object-level autho...
Solution 2c: Role-based Dynamic Display driven
by Ontology
• Ontologies often lack a content management system (CMS)
• An ...
IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of ...
GAP 3: INFORMATION
REPRESENTATION IS INCONSISTENT
Solution 3a: Use Ontology
• Ontology: a way to describe types of entities
and relations between them
• Why use ontology
– ...
Many Domains of Knowledge are needed to describe
an outbreak investigation Build On, Work With:
OBI
TypON
NGSOnto
NIAID-GS...
Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist and
starting an epidemiology checklist
• Metadata Do...
GAP 4: GENOMIC DATA
INTERPRETATION IS COMPLEX AND
TECHNOLOGY IS EVOLVING
Solution 4a: Use of QA/QC in IRIDA
• Software Engineering
– High quality software that meets regulatory guidelines
– Open ...
Solution 4b: Generation of validation datasets
To Participate, Contact
Rene Hendriksen
rshe@food.dtu.dk
Or
Errol Strain
Er...
Solution 4c: Exploratory tools can access certain
data via REST API securely
27
http://pathogenomics.sfu.ca/islandviewer
I...
Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release
– Release to collaborators for installation and full test
• Jul...
Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶ...
30
30
IRIDA Annual General Meeting
Winnipeg, April 8-9, 2015
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomi...
Upcoming SlideShare
Loading in …5
×

IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

151 views

Published on

IRIDA: Canada’s federated platform for genomic epidemiology

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

  1. 1. IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao, Ph.D. William.hsiao@bccdc.ca @wlhsiao BC Public Health Microbiology and Reference Laboratory and University of British Columbia ABPHM 2015
  2. 2. Genome Canada Bioinformatics Competition: Large-Scale Project “A Federated Bioinformatics Platform for Public Health Microbial Genomics” Our Goal The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations 2 www.IRIDA.ca
  3. 3. 3 Each year, one in eight Canadians (or four million people) get sick with a domestically acquired food-borne illness.
  4. 4. Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real- time use cases in public health agencies - Project Team has direct access to state of the art research in academia - Project Team is directly embedded in user organization
  5. 5. Interviews with key personnel to identify barriers to implement genomic epidemiology in public health agencies 5
  6. 6. GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS
  7. 7. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data • Carefully designed and engineered software platform is just the starting point… User Interface Security File system Metadata Storage Application logic REST API Workflow Execution Manager Continuous Integration Documentation
  8. 8. • Easy to use interface hiding the technical details Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  9. 9. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  10. 10. Solution 1b: Build Portable and Transparent Pipelines • Use Galaxy as workflow engine – large community support • Retools to address usability, security, and other limitations • Version Controlled Pipeline Templates • Input files, parameters, and workflow are sent to IRIDA-specific Galaxy for execution • Results and provenance information are copied from Galaxy 1. Input files sent to Galaxy 3. Results downloaded from Galaxy IRIDA UI/DB Galaxy Assembly Tools Variant Calling Tools … REST API Shared File System Worker Worker 2. Tools executed on Galaxy workers
  11. 11. Solution 1c: Start the training NOW! • Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators • IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016 • We would like to hear about other training initiatives and share experience and training material
  12. 12. GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC
  13. 13. Many Players in surveillance and outbreak – ineffective information sharing Source: M. Taylor, BCCDC Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Frontline lab Information BioinformaticsandAnalyticalCapacities
  14. 14. Many Systems used in Reporting Diseases – require data re-entry and re-coding National Ministry of Health Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Local laboratory Fax/Electronic Fax Phone/Fax Electronic/Paper Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni c Source: M. Taylor, BCCDC
  15. 15. IRIDA is designed with these dilemma in mind • Solutions: – 2a: Localized Instance of federated databases – 2b: Permission Control – authentication /authorization for information sharing – 2c: User role-based display of information
  16. 16. Solution 2a: Local/Cloud Instances and Data Federation • Data processing capacity pushed to data generating labs • Allow data sharing securely for enhanced analysis • Eventually cultivating a culture of openness of data sharing and collaborative development of tools 16
  17. 17. Authorization Solution 2b: Security • Local authorization per instance. • Method-level authorization. • Object-level authorization. • Allow secure, fine grained and flexible information sharing
  18. 18. Solution 2c: Role-based Dynamic Display driven by Ontology • Ontologies often lack a content management system (CMS) • An Interface Model Ontology (IFM) can define a CMS for an ontology
  19. 19. IFM Interface View Permissions Detailed View Restricted View E.g. User role permissions control visibility and editing of content
  20. 20. GAP 3: INFORMATION REPRESENTATION IS INCONSISTENT
  21. 21. Solution 3a: Use Ontology • Ontology: a way to describe types of entities and relations between them • Why use ontology – Ontology is flexible and expandable – Lower levels of expressivity (e.g. controlled vocabulary, data dictionary) are heavy handed and show low level of compliance and adoption – Free text used as an alternative that are not computing friendly – Ontology and semantic web technologies may be a solution
  22. 22. Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With: OBI TypON NGSOnto NIAID-GSC-BRC core metadata MIxS Ontology NCBI Biosample etc TRANS – Pathogen Transmission EPO Exposure Ontology Infectious Disease Ontology CARD, ARO for AMR USDA Nutrient DB EFSA Comp. Food Consump. DB Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.
  23. 23. Lab Checklist/Ontology • Currently finishing a lab/genomics checklist and starting an epidemiology checklist • Metadata Domains: – Sample Collection – Sample Source – Environmental – Lab Analytics – Sequencing Process /QC – Sequencing Run /QC – Assembly Process / QC – Others overlapping with Epi: Demographic / Geographic / etc.
  24. 24. GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING
  25. 25. Solution 4a: Use of QA/QC in IRIDA • Software Engineering – High quality software that meets regulatory guidelines – Open Source product to ensure “white box” testing – Ontology driven software development – Follow proper software development cycle • Data Quality – Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality • Analytic Tool Quality – Utilize validation datasets – Use of abstract pipeline description – with version control – Periodic analysis of exceptions and boundary cases to assess tool accuracy
  26. 26. Solution 4b: Generation of validation datasets To Participate, Contact Rene Hendriksen rshe@food.dtu.dk Or Errol Strain Errol.Strain@fda.hhs.gov http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
  27. 27. Solution 4c: Exploratory tools can access certain data via REST API securely 27 http://pathogenomics.sfu.ca/islandviewer IslandViewer Dhillon and Laird et al. 2015, Nucleic Acids Research http://kiwi.cs.dal.ca/GenGIS Parks et al. 2013, PLoS One
  28. 28. Availability • Jun 1 2015: IRIDA 1.0 beta Internal Release – Release to collaborators for installation and full test • Jul 1 2015: IRIDA 1.0 beta1 – Announce Beta release, download, documentation available on website – www.irida.ca • Aug 1 2015: IRIDA 1.0 beta2 – Cloud installer, with documentation – Additional pipelines as available – Visualization as available
  29. 29. Acknowledgements Project Leaders Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML University of Lisbon Joᾶo Carriҫo National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olson Tarah Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Morag Graham Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Taboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall Simon Fraser University (SFU) Melanie Courtot Emma Griffiths Geoff Winsor Julie Shay Matthew Laird Bhav Dhillon Raymond Lo BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Judy Isaac-Renton Patrick Tang Natalie Prystajecky Jennifer Gardy Damion Dooley Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Cletus D’Souza Ana Paccagnella University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Burton Blais Catherine Carrillo Dominic Lambert Dalhousie University Rob Beiko Alex Keddy 29 McMaster University Andrew McArthur Daim Sardar European Nucleotide Archive Guy Cochrane Petra ten Hoopen Clara Amid European Food Safety Agency Leibana Criado Ernesto Vernazza Francesco Rizzi Valentina
  30. 30. 30 30 IRIDA Annual General Meeting Winnipeg, April 8-9, 2015
  31. 31. The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations Contacts: William.hsiao@bccdc.ca @wlhsiao 31 www.IRIDA.ca

×