SlideShare a Scribd company logo
IRIDA: Canada’s federated platform
for genomic epidemiology
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Public Health Microbiology and Reference Laboratory
and University of British Columbia
ABPHM 2015
Genome Canada Bioinformatics Competition: Large-Scale Project
“A Federated Bioinformatics Platform for
Public Health Microbial Genomics”
Our Goal
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
2 www.IRIDA.ca
3
Each year, one in eight Canadians (or
four million people)
get sick with a domestically acquired
food-borne illness.
Partnership among public health agencies and academic institutes to bridge the gaps
between advancements in genomic epidemiology and application to real-life and real-
time use cases in public health agencies
- Project Team has direct access to state of the art research in academia
- Project Team is directly embedded in user organization
National
Public Health Agency
Provincial
Public Health Agency
Academic/Public
Interviews with key personnel to identify
barriers to implement genomic epidemiology in
public health agencies
5
GAP 1: PUBLIC HEALTH PERSONNEL
LACK TRAINING IN GENOMICS
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
• Carefully designed and engineered software platform is
just the starting point… User
Interface
Security
File system
Metadata
Storage
Application
logic
REST API
Workflow Execution Manager
Continuous Integration Documentation
• Easy to use interface hiding the technical details
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1a: Build a User Friendly, high quality
analysis platform to process genomics data
Solution 1b: Build Portable and Transparent
Pipelines
• Use Galaxy as workflow engine – large
community support
• Retools to address usability, security, and
other limitations
• Version Controlled Pipeline Templates
• Input files, parameters, and workflow are
sent to IRIDA-specific Galaxy for execution
• Results and provenance information are
copied from Galaxy
1. Input
files sent to
Galaxy
3. Results
downloaded
from Galaxy
IRIDA UI/DB
Galaxy
Assembly Tools
Variant Calling Tools
…
REST API
Shared File System
Worker Worker
2. Tools executed
on Galaxy workers
Solution 1c: Start the training NOW!
• Canada’s National Microbiology Laboratory has hosted
genomic workshops for partners and collaborators
• IRIDA Project has dedicated funding for hosting workshops in
4Q of 2015 and 2016
• We would like to hear about other training initiatives and
share experience and training material
GAP 2: INFORMATION SHARING IS
INEFFICIENT AND AD-HOC
Many Players in surveillance and outbreak –
ineffective information sharing
Source: M. Taylor, BCCDC
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Frontline lab
Information
BioinformaticsandAnalyticalCapacities
Many Systems used in Reporting Diseases –
require data re-entry and re-coding
National Ministry of
Health
Provincial public
health dept.
National laboratory
Local public
health dept.
Provincial
laboratory
Cases
Physicians Local laboratory
Fax/Electronic
Fax
Phone/Fax
Electronic/Paper
Electronic/Fax/Phone
Mailing of
Samples/Fax/Eelctroni
c
Source: M. Taylor, BCCDC
IRIDA is designed with these dilemma in mind
• Solutions:
– 2a: Localized Instance of federated databases
– 2b: Permission Control – authentication /authorization for
information sharing
– 2c: User role-based display of information
Solution 2a: Local/Cloud Instances and Data
Federation
• Data processing capacity pushed to data generating
labs
• Allow data sharing securely for enhanced analysis
• Eventually cultivating a culture of openness of data
sharing and collaborative development of tools
16
Authorization
Solution 2b: Security
• Local authorization per instance.
• Method-level authorization.
• Object-level authorization.
• Allow secure, fine grained and
flexible information sharing
Solution 2c: Role-based Dynamic Display driven
by Ontology
• Ontologies often lack a content management system (CMS)
• An Interface Model Ontology (IFM) can define a CMS for an
ontology
IFM Interface View Permissions
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
GAP 3: INFORMATION
REPRESENTATION IS INCONSISTENT
Solution 3a: Use Ontology
• Ontology: a way to describe types of entities
and relations between them
• Why use ontology
– Ontology is flexible and expandable
– Lower levels of expressivity (e.g. controlled vocabulary,
data dictionary) are heavy handed and show low level of
compliance and adoption
– Free text used as an alternative that are not computing
friendly
– Ontology and semantic web technologies may be a
solution
Many Domains of Knowledge are needed to describe
an outbreak investigation Build On, Work With:
OBI
TypON
NGSOnto
NIAID-GSC-BRC core metadata
MIxS Ontology
NCBI Biosample etc
TRANS – Pathogen Transmission
EPO
Exposure Ontology
Infectious Disease Ontology
CARD, ARO for AMR
USDA Nutrient DB
EFSA Comp. Food Consump. DB
Example gaps to be filled:
Expand food ontology; expand CARD
AMR data with others.
Lab Checklist/Ontology
• Currently finishing a lab/genomics checklist and
starting an epidemiology checklist
• Metadata Domains:
– Sample Collection
– Sample Source
– Environmental
– Lab Analytics
– Sequencing Process /QC
– Sequencing Run /QC
– Assembly Process / QC
– Others overlapping with Epi: Demographic / Geographic /
etc.
GAP 4: GENOMIC DATA
INTERPRETATION IS COMPLEX AND
TECHNOLOGY IS EVOLVING
Solution 4a: Use of QA/QC in IRIDA
• Software Engineering
– High quality software that meets regulatory guidelines
– Open Source product to ensure “white box” testing
– Ontology driven software development
– Follow proper software development cycle
• Data Quality
– Built-in modules to check for input data quality
– Warnings and Feedbacks during pipeline execution to laboratory technologists
– Use of Ontology to check metadata (non-genomic) data quality
• Analytic Tool Quality
– Utilize validation datasets
– Use of abstract pipeline description – with version control
– Periodic analysis of exceptions and boundary cases to assess tool accuracy
Solution 4b: Generation of validation datasets
To Participate, Contact
Rene Hendriksen
rshe@food.dtu.dk
Or
Errol Strain
Errol.Strain@fda.hhs.gov
http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
Solution 4c: Exploratory tools can access certain
data via REST API securely
27
http://pathogenomics.sfu.ca/islandviewer
IslandViewer
Dhillon and Laird et al. 2015, Nucleic Acids
Research
http://kiwi.cs.dal.ca/GenGIS
Parks et al. 2013, PLoS One
Availability
• Jun 1 2015: IRIDA 1.0 beta Internal Release
– Release to collaborators for installation and full test
• Jul 1 2015: IRIDA 1.0 beta1
– Announce Beta release, download, documentation
available on website – www.irida.ca
• Aug 1 2015: IRIDA 1.0 beta2
– Cloud installer, with documentation
– Additional pipelines as available
– Visualization as available
Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olson
Tarah Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Morag Graham
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Rob Beiko
Alex Keddy
29
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina
30
30
IRIDA Annual General Meeting
Winnipeg, April 8-9, 2015
The IRIDA platform
(Integrated Rapid Infectious Disease Analysis)
An open source, standards compliant, high quality genomic epidemiology
analysis platform to support real-time (food-borne) disease outbreak
investigations
Contacts:
William.hsiao@bccdc.ca
@wlhsiao
31 www.IRIDA.ca

More Related Content

What's hot

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
Michel Dumontier
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
Michel Dumontier
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress updateGenomeInABottle
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
Francisco Couto
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologEleanor Howe
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
Michel Dumontier
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
DavidTyrpak1
 
Open Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature OverviewOpen Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature Overview
Alex Clark
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
Dmitry Grapov
 
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
EuFMD
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
Fiona Nielsen
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
Heather Piwowar
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
DavidTyrpak1
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
ExternalEvents
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
GigaScience, BGI Hong Kong
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
GigaScience, BGI Hong Kong
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analyses
Casey Greene
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
ExternalEvents
 

What's hot (20)

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
2016 bmdid-mappings
2016 bmdid-mappings2016 bmdid-mappings
2016 bmdid-mappings
 
2014 agbt giab_progress update
2014 agbt giab_progress update2014 agbt giab_progress update
2014 agbt giab_progress update
 
Metadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata qualityMetadata Analyser: measuring metadata quality
Metadata Analyser: measuring metadata quality
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
 
In sillico 2 send
In sillico 2 sendIn sillico 2 send
In sillico 2 send
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
Open Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature OverviewOpen Drug Discovery Teams Feature Overview
Open Drug Discovery Teams Feature Overview
 
Gene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -TutorialGene Ontology Enrichment Network Analysis -Tutorial
Gene Ontology Enrichment Network Analysis -Tutorial
 
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
Improving surveillance and early detection of Foot-and-mouth And Similar Tran...
 
Genome sharing projects around the world nijmegen oct 29 - 2015
Genome sharing projects around the world   nijmegen oct 29 - 2015Genome sharing projects around the world   nijmegen oct 29 - 2015
Genome sharing projects around the world nijmegen oct 29 - 2015
 
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
BIOLINK 2008:    Linking database submissions to primary citations with PubMe...BIOLINK 2008:    Linking database submissions to primary citations with PubMe...
BIOLINK 2008: Linking database submissions to primary citations with PubMe...
 
David Tyrpak CV
David Tyrpak CVDavid Tyrpak CV
David Tyrpak CV
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Using ADAGE for pathway-style analyses
Using ADAGE for pathway-style analysesUsing ADAGE for pathway-style analyses
Using ADAGE for pathway-style analyses
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 

Viewers also liked

NETTAB 2013
NETTAB 2013NETTAB 2013
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
 
UKON 2014
UKON 2014UKON 2014
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
Alejandra Gonzalez-Beltran
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
Philippe Rocca-Serra
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Alejandra Gonzalez-Beltran
 

Viewers also liked (12)

Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
NETTAB 2013
NETTAB 2013NETTAB 2013
NETTAB 2013
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 

Similar to IRIDA: Canada’s federated platform for genomic epidemiology

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
IRIDA_community
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
William Hsiao
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
IRIDA_community
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Fiona Nielsen
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Fiona Nielsen
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
CGIAR Research Program on Dryland Systems
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
Manuel Corpas
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
CINECAProject
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
Denodo
 
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoveryCheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Sean Ekins
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
IRIDA_community
 
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content TypesIlik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
National Information Standards Organization (NISO)
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
Philip Bourne
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
Pistoia Alliance
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
Philip Bourne
 
Sbm open science committee report to the board
Sbm open science committee report to the boardSbm open science committee report to the board
Sbm open science committee report to the board
Bradford Hesse
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
Philip Bourne
 
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
Lew Berman
 

Similar to IRIDA: Canada’s federated platform for genomic epidemiology (20)

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Finding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics DatasetsFinding and Accessing Human Genomics Datasets
Finding and Accessing Human Genomics Datasets
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...CINECA webinar slides: Open science through fair health data networks dream o...
CINECA webinar slides: Open science through fair health data networks dream o...
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug DiscoveryCheminformatics Workflows Using Mobile Apps for Drug Discovery
Cheminformatics Workflows Using Mobile Apps for Drug Discovery
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content TypesIlik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
Ilik - Beyond the Manuscript: Using IRs for Non Traditional Content Types
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
Some Early Thoughts
Some Early ThoughtsSome Early Thoughts
Some Early Thoughts
 
Sbm open science committee report to the board
Sbm open science committee report to the boardSbm open science committee report to the board
Sbm open science committee report to the board
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
IFD&TC 2018: A Novel Approach for Conveniently and Securely Collecting Person...
 

Recently uploaded

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 

Recently uploaded (20)

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 

IRIDA: Canada’s federated platform for genomic epidemiology

  • 1. IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao, Ph.D. William.hsiao@bccdc.ca @wlhsiao BC Public Health Microbiology and Reference Laboratory and University of British Columbia ABPHM 2015
  • 2. Genome Canada Bioinformatics Competition: Large-Scale Project “A Federated Bioinformatics Platform for Public Health Microbial Genomics” Our Goal The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations 2 www.IRIDA.ca
  • 3. 3 Each year, one in eight Canadians (or four million people) get sick with a domestically acquired food-borne illness.
  • 4. Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real- time use cases in public health agencies - Project Team has direct access to state of the art research in academia - Project Team is directly embedded in user organization National Public Health Agency Provincial Public Health Agency Academic/Public
  • 5. Interviews with key personnel to identify barriers to implement genomic epidemiology in public health agencies 5
  • 6. GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS
  • 7. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data • Carefully designed and engineered software platform is just the starting point… User Interface Security File system Metadata Storage Application logic REST API Workflow Execution Manager Continuous Integration Documentation
  • 8. • Easy to use interface hiding the technical details Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 9. Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data
  • 10. Solution 1b: Build Portable and Transparent Pipelines • Use Galaxy as workflow engine – large community support • Retools to address usability, security, and other limitations • Version Controlled Pipeline Templates • Input files, parameters, and workflow are sent to IRIDA-specific Galaxy for execution • Results and provenance information are copied from Galaxy 1. Input files sent to Galaxy 3. Results downloaded from Galaxy IRIDA UI/DB Galaxy Assembly Tools Variant Calling Tools … REST API Shared File System Worker Worker 2. Tools executed on Galaxy workers
  • 11. Solution 1c: Start the training NOW! • Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators • IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016 • We would like to hear about other training initiatives and share experience and training material
  • 12. GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC
  • 13. Many Players in surveillance and outbreak – ineffective information sharing Source: M. Taylor, BCCDC Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Frontline lab Information BioinformaticsandAnalyticalCapacities
  • 14. Many Systems used in Reporting Diseases – require data re-entry and re-coding National Ministry of Health Provincial public health dept. National laboratory Local public health dept. Provincial laboratory Cases Physicians Local laboratory Fax/Electronic Fax Phone/Fax Electronic/Paper Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni c Source: M. Taylor, BCCDC
  • 15. IRIDA is designed with these dilemma in mind • Solutions: – 2a: Localized Instance of federated databases – 2b: Permission Control – authentication /authorization for information sharing – 2c: User role-based display of information
  • 16. Solution 2a: Local/Cloud Instances and Data Federation • Data processing capacity pushed to data generating labs • Allow data sharing securely for enhanced analysis • Eventually cultivating a culture of openness of data sharing and collaborative development of tools 16
  • 17. Authorization Solution 2b: Security • Local authorization per instance. • Method-level authorization. • Object-level authorization. • Allow secure, fine grained and flexible information sharing
  • 18. Solution 2c: Role-based Dynamic Display driven by Ontology • Ontologies often lack a content management system (CMS) • An Interface Model Ontology (IFM) can define a CMS for an ontology
  • 19. IFM Interface View Permissions Detailed View Restricted View E.g. User role permissions control visibility and editing of content
  • 21. Solution 3a: Use Ontology • Ontology: a way to describe types of entities and relations between them • Why use ontology – Ontology is flexible and expandable – Lower levels of expressivity (e.g. controlled vocabulary, data dictionary) are heavy handed and show low level of compliance and adoption – Free text used as an alternative that are not computing friendly – Ontology and semantic web technologies may be a solution
  • 22. Many Domains of Knowledge are needed to describe an outbreak investigation Build On, Work With: OBI TypON NGSOnto NIAID-GSC-BRC core metadata MIxS Ontology NCBI Biosample etc TRANS – Pathogen Transmission EPO Exposure Ontology Infectious Disease Ontology CARD, ARO for AMR USDA Nutrient DB EFSA Comp. Food Consump. DB Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.
  • 23. Lab Checklist/Ontology • Currently finishing a lab/genomics checklist and starting an epidemiology checklist • Metadata Domains: – Sample Collection – Sample Source – Environmental – Lab Analytics – Sequencing Process /QC – Sequencing Run /QC – Assembly Process / QC – Others overlapping with Epi: Demographic / Geographic / etc.
  • 24. GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING
  • 25. Solution 4a: Use of QA/QC in IRIDA • Software Engineering – High quality software that meets regulatory guidelines – Open Source product to ensure “white box” testing – Ontology driven software development – Follow proper software development cycle • Data Quality – Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality • Analytic Tool Quality – Utilize validation datasets – Use of abstract pipeline description – with version control – Periodic analysis of exceptions and boundary cases to assess tool accuracy
  • 26. Solution 4b: Generation of validation datasets To Participate, Contact Rene Hendriksen rshe@food.dtu.dk Or Errol Strain Errol.Strain@fda.hhs.gov http://www.globalmicrobialidentifier.org/Workgroups#work-group-4
  • 27. Solution 4c: Exploratory tools can access certain data via REST API securely 27 http://pathogenomics.sfu.ca/islandviewer IslandViewer Dhillon and Laird et al. 2015, Nucleic Acids Research http://kiwi.cs.dal.ca/GenGIS Parks et al. 2013, PLoS One
  • 28. Availability • Jun 1 2015: IRIDA 1.0 beta Internal Release – Release to collaborators for installation and full test • Jul 1 2015: IRIDA 1.0 beta1 – Announce Beta release, download, documentation available on website – www.irida.ca • Aug 1 2015: IRIDA 1.0 beta2 – Cloud installer, with documentation – Additional pipelines as available – Visualization as available
  • 29. Acknowledgements Project Leaders Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML University of Lisbon Joᾶo Carriҫo National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olson Tarah Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Morag Graham Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Taboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall Simon Fraser University (SFU) Melanie Courtot Emma Griffiths Geoff Winsor Julie Shay Matthew Laird Bhav Dhillon Raymond Lo BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Judy Isaac-Renton Patrick Tang Natalie Prystajecky Jennifer Gardy Damion Dooley Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Cletus D’Souza Ana Paccagnella University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Burton Blais Catherine Carrillo Dominic Lambert Dalhousie University Rob Beiko Alex Keddy 29 McMaster University Andrew McArthur Daim Sardar European Nucleotide Archive Guy Cochrane Petra ten Hoopen Clara Amid European Food Safety Agency Leibana Criado Ernesto Vernazza Francesco Rizzi Valentina
  • 30. 30 30 IRIDA Annual General Meeting Winnipeg, April 8-9, 2015
  • 31. The IRIDA platform (Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak investigations Contacts: William.hsiao@bccdc.ca @wlhsiao 31 www.IRIDA.ca

Editor's Notes

  1. Today, I’d like to tell you a bit about some of Canada’s effort on building a genomic epidemiology analysis platform
  2. IRIDA was conceived about 2 years ago through a Genome Canada Bioinformatics Grant. It is an effort to build an open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time disease outbreak investigations, initially focused on food-borne illnesses
  3. Despite our high standard in food safety, each year 1 in eight Canadian get food poisoning, costing the economy $4 billion dollars. It is important to track the source and spread of the disease to prevent further sickness
  4. IRIDA is partnership among provincial public health agencies, national public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and real-life and real-time use cases in public health agencies Project Team has direct access to state of the art research in academia Project Team is directly embedded in user organization
  5. Since we have access to the end users, we conducted interviews with these subject experts to identify what are the barriers for up-taking of genomics epidemiology in public health agencies. We interviewed epidemiologists, lab scientists and technologists, medical microbiologists and lab administrators. So for the rest of the presentation, I’ll talk about some of the gaps we identified and how IRIDA can meet the requirements.
  6. The first gap which should not be a surprise to this audience, is that public health workers are mostly unfamiliar with genomics and the bioinformatics analysis needed to process and interpret genomic data
  7. While we do believe in the long run, adequate training in genomics is needed to bridge this gap and in the short term, experts such as yourselves are needed as stopgaps, having high quality analysis platform to automate data processing and has consistent analysis protocols will help to ease the transition. However, carefully designed and engineered software platform is just the starting point and there will no doubt be many similar platforms to choose from. So I will touch on some of the more interesting design philosophies we have for IRIDA.
  8. We found that in the diagnostic testing world, complex procedures with lots of options lead to more human errors and more non-compliance. So, one design solution that we stress on is to have a simple user interface that hides the technical details. This solution of course can’t stand on its own and I’ll describe measures to ensure that flexibility and scientific rigors can be maintained
  9. We think a user interface should be like a joke… If you have to explain it , then it’s not good. That said, we do have extensive documentations for the administrators and accreditation auditors who don’t like jokes :P
  10. Next solution is to leverage Galaxy which has a large community support and user base as our pipeline engine. We had to retool Galaxy extensively to address usability, security and other limitations. To achieve this we build IRIDA platform on top of the Galaxy engine where input files, parameters and workflows are sent to Galaxy for execution and results and pipeline provenance information are copied back into the IRIDA database for
  11. To address the knowledge gaps in genomics, we have started training our public health lab workers on genomic analysis. We would like to hear about other training initiatives and will be happy to share our experience and training material
  12. The second gap that we identified is that sharing of information within and between organizations are highly inefficient and often involves sharing of Excel files with deleted columns to hide sensitive information
  13. There are many players involved in infectious disease surveillance and outbreak investigation. However, concerns with privacy and confidentiality (both founded and unfounded) means that information tend to be aggregated and lost as we move from the frontline labs to public health and reference labs. However the bioinformatics and analytical capacities are the most abundant in central labs and academia
  14. Moreover, different institutions have different software and often data is exported and printed, faxed, then re-imported to a new system by re-typing! This is a huge waste of time and source of errors
  15. IRIDA has a few designs to deal with these issues, and I’ll highlight 3 here.
  16. First we propose that we should push the data processing capacity to the periphery where data is the richest by encouraging local or private cloud instances of the IRIDA platform. This way our partners would not be obligated to give up their data. The different instances are connected via a federated database schema. Data can then be shared securely and easily to allow enhanced analysis to be done by genomic experts located centrally. The more we share successfully, the more likely people will realize the benefit in sharing and this can lead to a new culture of openness
  17. Second we have built-in mechanisms for authentication and authorization at different levels to allow secure and fine grained information sharing. This would allow parties to customize the data they share per material and data transfer agreements
  18. Third, we realized we need to have a flexible user interface to present the data. Therefore, we are in the process of developing an interface model ontology which defines a content management system.
  19. As an example, based on the user’s role, they will be able to see the content of the database displayed differently.
  20. The third gap we identified is that information representation is inconsistent across organizations
  21. Given the richness and complexity of genomic epidemiological data, we opt to use and develop ontologies compliant with OBO Foundry to describe the data; Currently, lower levels of expressivity such as controlled vocabularies and data dictionaries are used but they tend to be heavy handed and show low level of compliance and adoption. We believe ontology and semantic web technologies can make data sharing across heterogeneous systems and platforms more tractable.
  22. There are many domains of knowledge needed to describe an outbreak investigation and we strive to re-use existing standards as much as possible
  23. Currently we are finishing a lab/genomic checklist and will be starting an epidemiology checklist soon
  24. Lastly, Jon and others mentioned yesterday, genomic data interpretation is complex and the technology is still evolving, yet in the world of diagnostic lab, accreditation means standardized protocols need to be developed
  25. So we focus quite a bit of our energy on developing high quality software with build-in QA and QC components to assess data quality and analytic tool performance
  26. I also want to highlight GMI’s WG4’s effort in developing proficiency tests for wet lab and analysis pipelines. To participate you can contact Rene or Errol.
  27. To facilitate tool improvement and to allow exploratory analysis not part of IRIDA pipelines to be done, we would also allow pre-authorized tools to connect to IRIDA via a REST API securely. Currently we have two external tools for genomic island detection and phylogeography analysis.
  28. The software will be released to a few international collaborators for full testing by Jun 1. Then in Jul 1, we plan to release the beta version publicly so people can try it out. Of course the software will be free and we would love to collaborate with people on both the software and the ontology development.
  29. Large Group of People who contributed to this work
  30. We also have a wonderful group of advisors