1. IRIDA: Canada’s federated
platform for genomic
epidemiology
William Hsiao, Ph.D.
William.hsiao@bccdc.ca
@wlhsiao
BC Centre for Disease Control Public Health Laboratory
and University of British Columbia
2. IRIDA Platform Overview
• IRIDA= Integrated Rapid Infectious Disease Analysis
• A free, open source, standards compliant, high quality genomic
epidemiology analysis platform to support real-time disease
outbreak investigations
Core Functions:
• Management of strain and genomic sequence data
• Rapid processing and analysis of genomic data
• Informative display of genomic results
• Sample, Case, and aggregate data (“metadata”) Management
Target audience:
• Public health agencies who need a platform to manage and
process genomic data
• Public health agencies who need a platform to use genomics for
outbreak investigations
IRIDA
Sequencing
Instruments
Web
Application
Data
management
Built-in
Analytical
Tools
External
Galaxy
Command-
line Tools
3. 10 simple rules (wish list) to build a
better public health microbiology
genomic epidemiology analysis
system
Download
Latest version at https://github.com/phac-nml/irida
4. 1: Engage the Users Through the Entire Software
Development Cycle
National
Public Health Agency
Provincial
Public Health Agency
Academic/Public
- Project Team has direct access to state of the
art research in academia
- Project Team is directly embedded in user
organization
5. 2: Have A Simple User Interface
Line List View (under testing)
Timeline View (Conceptualization)
Selectable fields
Travel
Symptoms and Onset
Exposure Types
Hospitalization
Launch a pipeline
Be Like
6. 3: Build a Robust, Extensible Platform
• IRIDA uses Galaxy to
manage workflows
• Adding additional
pipelines is relatively
easy
• Using a standard
API to allow 3rd party
tools to obtain data
from IRIDA (e.g.
IslandViewer and
GenGIS)
IRIDA
ServletContainer
REST API
Central File
Storage
Web
Interface
ApplicationLogic
Compute
Cluster
Galaxy
$ ~ >_ Galaxy
http://www.pathogenomics.sfu.ca/islandviewer/
http://kiwi.cs.dal.ca/GenGIS/Main_Page
7. 4: Have Extensive Documentation
• Documentation should be available for
• Users – step by step tutorial with screen shots / FAQ
• System Administrators – installation instructions / issue trackers
• Developers – open source, collaborative development / IRC Channel
• Easily Accessible at https://irida.corefacility.ca/documentation/
8. 5: Implement QC Throughout the Whole Application
• Genomics is sensitive and sequence data are inherently noisy
• Genomics is a rapidly advancing technology
• Standardizing pipelines difficult and can stifle innovation
• Better to standardize the performance and reporting metrics and ensure any
validated pipelines meet the testing criteria
• Developing a general QC testing module (RCQC) that use ontology to
standardize QC metrics (https://github.com/Public-Health-Bioinformatics/rcqc)
• Data Provenance and Version Control (data + Pipelines) are must’s for
Diagnostic Labs
9. 6: Build to Enable Collaboration
• Be able to compare pipelines
• Pipeline implemented using Galaxy – transparent
and shareable
• Define QC criteria using ontology to compare the
different pipelines of the same purpose
• Be able to share data in standard formats to
minimize data re-entry from one platform to
another
• Federation of platforms using standard API to
share data and analysis results
10. 7: Use Compatible Data Standards
• Sequence data are more compatible / shareable but
metadata are currently in silo and incompatible
• Collaboration and Sharing are difficult when data are
incompatible
• Compatibility != Sameness
• Use Ontology to allow customization of term list but
all terms with same meaning (semantics) should have
the same universal ID (e.g. an URL) to facilitate
mapping of terms
11. 8: Implement Fine Grained Access Control
Detailed View Restricted View
E.g. User role permissions control visibility and editing of content
Authorization
• Industry-standard
authentication and
authorization mechanisms
• Local authorization per
instance.
• Method-level authorization.
• Object-level authorization.
12. 9: Use Technology to Safeguard Patient Privacy
It’s easy to lose control of the Excel Line List -
someone can make a copy of the content and pass
it around without your knowledge; typos are
common and cumulative!
Technology can control who sees what and when
Separate out sensitive patient data from pathogen
sequence data but be able to bring them together
when necessary without resorting to emailing of
line lists!
13. 10: Have Multiple, Flexible Access Options
• No one size fits all solution; Having many platforms to choose from is
a good thing (but data should be portable across platforms!)
• IRIDA is available in several different flavours:
Local Install Virtual Machine Cloud Instance Public Version
Advantages Full control of the
system; your data
never leave your
centre
Full control of the
system; Easy to setup
Full control of the
system; does not
require local
computing
infrastructure
No setup required,
upload your data and
have it processed
using Compute
Canada Resource
Disadvantages Computing
infrastructure and IT
support needed to
main the resource
Not really scalable if
run on your own
desktop; some
performance loss
Data go into a cloud
environment;
uploading to cloud
environment can be
slow
Data go into a public
instance (data
remain private to
your account);
upload can be slow
14. Acknowledgements
Project Leaders
Fiona Brinkman – SFU
Will Hsiao – PHMRL
Gary Van Domselaar – NML
University of Lisbon
Joᾶo Carriҫo
National Microbiology Laboratory (NML)
Franklin Bristow
Aaron Petkau
Thomas Matthews
Josh Adam
Adam Olson
Tarah Lynch
Shaun Tyler
Philip Mabon
Philip Au
Celine Nadon
Matthew Stuart-Edwards
Morag Graham
Chrystal Berry
Lorelee Tschetter
Aleisha Reimer
Laboratory for Foodborne Zoonoses (LFZ)
Eduardo Taboada
Peter Kruczkiewicz
Chad Laing
Vic Gannon
Matthew Whiteside
Ross Duncan
Steven Mutschall
Simon Fraser University (SFU)
Melanie Courtot
Emma Griffiths
Geoff Winsor
Julie Shay
Matthew Laird
Bhav Dhillon
Raymond Lo
BC Public Health Microbiology &
Reference Laboratory (PHMRL) and BC
Centre for Disease Control (BCCDC)
Judy Isaac-Renton
Patrick Tang
Natalie Prystajecky
Jennifer Gardy
Damion Dooley
Linda Hoang
Kim MacDonald
Yin Chang
Eleni Galanis
Marsha Taylor
Cletus D’Souza
Ana Paccagnella
University of Maryland
Lynn Schriml
Canadian Food Inspection Agency (CFIA)
Burton Blais
Catherine Carrillo
Dominic Lambert
Dalhousie University
Rob Beiko
Alex Keddy
14
McMaster University
Andrew McArthur
Daim Sardar
European Nucleotide Archive
Guy Cochrane
Petra ten Hoopen
Clara Amid
European Food Safety Agency
Leibana Criado Ernesto
Vernazza Francesco
Rizzi Valentina
Inspired by Jenn’s keynote, I reworked my slides in the 10 simple rules format
Many systems are and will be available for analyzing public health microbiology data and we have seen a few throughout this conference.
So I thought I’d present what I think are some of the rules and my wishlist for building a better public health genomic epidemiology platform.
Highlighting how some of this thinking apply to our implementation of a platform
Some of these rules have been implemented well in others applications
Large Group of People who contributed to this work