SlideShare a Scribd company logo
Beating Bugs with Big Data: Harnessing HPC
to Realize the Potential of Genomics in the
NHS in Wales
Dr Tom Connor
Bioinformatics Lead, Public Health Wales,
Reader, Cardiff University,
Group Leader, Quadram Institute Bioscience
Supported by
Funding Sources
MRC CLIMB
Welsh Government
Public Health Wales NHS
Trust
Microbes in the Food Chain
Acknowledgements
Other PHW Colleagues
Dr Matt Backx
Dr Catherine Moore
Trefor Morris
Michael Perry
Dr Harriet Hughes
Dr Noel Craine
Dr Simon Cottrell
Helen Adams
David Heyburn
Fatima Downing
Sue Edwards
Cardiff University
Dr Matt Bull
Dr Anna Price
The ARCCA team
CU IT Networking
CLIMB Collaborators
Professor Nick Loman
Simon Thompson
Radoslaw Poplawski
Simon Ellwood-Thompson
PHW Pathogen Genomics
Dr Sally Corden
Joanne Watkins
Lee Graham
Alec Birchley
Bree Wilcox
Jason Coombes
Lauren Gilbert
PHW Genomics
Workstreams (ARGENT,
DIGEST, WCM, HIV,
Influenza)
You are here
Wales is here
Basic Orientation
The NHS in Wales
• National Health Service, universal
healthcare, free at the point of use
• The NHS is a devolved area
• Public Health Wales is one of the 11
organisations that makes up the NHS in
Wales
• Has a wide remit including:
• Runs the microbiological diagnostic labs
for most of Wales’ hospitals and GPs
• Public health and surveillance
• PHW offers fully national services for the
whole 3.25 million population of Wales
Why does microbiology matter?
• Infectious disease accounts for ~7%
of all deaths in the UK and over
£30bn of economic costs per year
• Antibiotic resistance alone has a
projected cumulative global cost of
$100 Trillion by 2050
• We need better tools and systems to
prevent, diagnose, track and treat
infectious disease
Quick biology lesson; introducing microbial genomes
A bacterial cell
The bacterial chromosome for this
bacterial cell – its blueprint
That blueprint includes the plans for
proteins (“genes”) and other information
relating to how and when these need to
be made
This genetic information is the
organisms genome
So, what is Genomics?
Sequencing instruments read DNA, and
by extension, enable us to read the
genome of an organism of interest
However, what really defines the field is
the use of sequencing instruments
Put simply, genomics is the branch of
biology that is concerned with studying
genomes
From that we can make predictions
about the organism of interest, or
compare it with others
That means genomics has a lot of potential in healthcare
Faster results
More clinically
actionable information
per test
Clearer, better
diagnostics
- Genomics enables medicine to become more personalized
Simultaneously
- Genomics also gives us unprecedented tools to track and
combat outbreaks/epidemics on a population level
Data is digital and has many
uses
Why now? The reduction in the raw cost of sequencing the
human genome
20192003
* Humans are boring. For the
same money we can sequence
~50 bacterial genomes
Genomics Partnership Wales
• In July 2017 the Welsh Government launched the
Genomics for Precision Medicine Strategy, with over
£10M spent so far
• Covers both human and pathogen genomics
• PHW is leading the Pathogen Genomics elements,
with a Pathogen Genomics Unit launched in 2018
• Current development areas
– AMR bacterial surveillance and characterisation
– Cystic Fibrosis polymicrobial infection
diagnostics
– Enterovirus surveillance
• Production systems
– C. difficile surveillance and outbreak support
– Mycobacteria identification and characterisation
– Influenza surveillance
– HIV susceptibility testing
Accreditation in process
System design/needs assessment in
progress
Pilot system development
The (clinical) sequencing process has 5 main elements
1. Sample to Sequencer
2. Sequencing
3. Reads to reports
4. Interpretation
5. Fixing it when it breaks
Labwork Data science/Bioinformatics
However, the major costs and difficulties do not lie with
the generation of data, they lie with how we share, store
and analyse the data we generate
Bioinformatics expertise
User accessibility of software/hardware
Appropriate compute capacity
Software development
Storage availability
Network capacity
Sequencing is now relatively cheap and easy ; we can
sequence large numbers of strains for modest amounts
of money
These
account
for up to
90%
of the
costs of
doing
genomics
work
Our problem: The sequencing iceberg
Wave 1 Wave 2 Wave 3
2005-
09
1989-
97
2003-
07
1992-
2002
1993-98
1975-86
1937-611966-71
1967-89
1969-73
1969-81
1981-85
1974
1986-87
1969-73
Mutreja, Kim, Thomson, Connor et al, Nature, 2011
Pretty picture
320 samples
Approx 6-
700GB
uncompressed
data Sequence
Assembly
Each job 4-8GB
RAM
1 CPU core
Each job generates
intermediate files of
~6GB
Runtime: 1+
hours/job
Sequence
mapping 320 jobs
Each job 4GB RAM
1 CPU core
Each job generates
intermediate files of
~3GB
Runtime: 1
hour/job
Phylogenomics
1 job, 1+ cores, up
to 128GB RAM
Intermediate file
size ~2+GB
Output file ~2GB
Runtime 1-2 days
Virulence and
antimicrobial
resistance
screening
320 jobs, single
core
100MB ram
Runtime: 5
mins/job
Generates 10-20
small files per job
Bayesian
modelling
3 jobs, 1 core+, up
to 1 GB RAM
CPU intensive
Runtime: 2 days
per job
Output file ~10GB
Can use GPUs
Written in Java
Larger RAM HTC HTC
HPC Possibly HPC
Not so pretty workload
@M04531:39:000000000-AVVU9:1:1101:12907:2147 1:N:0:52
CCCGATGTTGCGCACGCCCTGGTTGGCGAAACCATAGTTGGCGCTGCCGGCATTGCCGAACCCGACGTTGAAGTTGCCCACATCCGCCAACCCGATG
TTGAGGATGGGGATCTGGTTCAACGCGGTCCCGGCCGCAGACACGCCCGACAGCTGATGGCCGACGTTGCCGAGGCCCGACAGCACCGCCGGCGT
CCCGAGCGGCAACACGCTGGTGTTGTAGATCCCCGAGACACCCGAGCCGACGTTGAGCA
+
BAAAABBFFFFBGGGGGGGGGGGGDGHGGGGGHHHHHHHHHGGGGGGHGCCGCFHHHG?EGGGG?EGEFHHHDGGHHHHHGGHHHGGGGGG
GGGGCDCGB1FHHGHGHGEDGHHGHHHHHHGGFGGGGHFGGGGGADGFGGGGEFGGGFFFFFFFFBFFFFFFFFFFFFC=FAFFFFFFFFFFFFFFFFFF
FFAFF.BCBBDAFFFFFFFFFFBADEFFFFFFFFFFFFAABFF?9>D;DA->B=EE.FF/
@M04531:39:000000000-AVVU9:1:1101:20177:2174 1:N:0:52
GTTTTCGTCGCGATCGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTGAAACAAAAAAGCCGTACCCACCATTCGCACCCGCGTACTC
ACCACTCTTACCGTCCTACTCACCCTTTGCTTTTTGCCCCGGCTTTATTCCTGTCGACGCTACTCCTTCCCCCCCCGCTGCGTGCTCTCCATCCCCCTCC
TCCGAGACCTGCTTTACTTCCGGGCCTTGTTCGCTGTCTTCTCCGCCTCTCTC
+
AAA?AF@FBDDDAEGEEEECCEAECE2AEFGGHA1AFGFFF?GHFH0EEHGGDDGHFH1333B?111/1B0/?//?F3/?/3?44//////<</>//1??G1/?01
1110.<...11<>1<C..<00=0<00<-<<0.:-;C-/000;0009/..-9;..-0;/000/9/.-;;;@--;-...9/////;;A.9.-A..9/99--..9.99////////.---9./9//;.-
..;//9////.....9///
@M04531:39:000000000-AVVU9:1:1101:17824:2565 1:N:0:52
TTCTAATACTGTATCATCTGCTCCTGTGTCTAATAGGGCTTCTATCAGCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAGGCATGATCTCGTATG
CCGTCTTCTGCTTGAAAAAAAAACTAAATAAACTGTTCACCATAAACGTGACTATTATGGCCATAAATAGTAACAATGTCTAGAGTAGATTCAATATGAAG
ACCGCTACATAATAGATAAATACGTCGTACGTCAACTTCGACAATCTGG
+
BBBBBFFFFFFFGGGGGGGGGGHGHHHHHHHHHHHHFHHHHHHHHBGFHHHHFHHHHHHHHHFHHHFHDHGEGCGCFFGGGCGHHHHGHGHHH
FFHHFGHHGHGGGHFHGHHGHGHCGFH@EEC/232B33332@2@>2>22@2<D22/0/??<<111111/?1<F111111>11<11111<00=0=DF00
0=D000000000//---::009;;000;000000.0.9-..-/...9900....-/;;/:
Why the complexity?
Next generation sequencing involves
taking many copies of a genome and
smashing it up into millions of
fragments that can be sequenced
simultaneously using a sequencing
instrument
We then have to put this back together
Think of it as a huge, complex jigsaw
puzzle, with most pieces being sky,
and a number of pieces being from
other puzzles
And, unlike in human genomics, we
often don’t know what the original
‘puzzle’ is meant to look like
Once we have rebuilt the puzzle, we
will have a range of questions we
want to ask
These questions require different
approaches to find the answers we
need
And microbial bioinformatics best
practice doesn’t exist yet
Our challenges
• We need to be able to rapidly process patient samples
• We need to be able to deal with large volumes of
complex data
• Need to be able to compare new samples with old
• Need to have locked down pipelines that don’t change
and can work anywhere
• Need to be able to deal with very mixed workloads
• Need the ability to scale our analyses
• Need to get around issues with physical infrastructure
(e.g. hospital networks)
• Need to be able to build upon what researchers have
done, but in a more enterprise way
• Need to do this in an environment (the NHS) which only
understands Windows
BIOINFORMATICS
MRC CLIMB
• 4 sites (1 in Cardiff)
each with ~1920 vCPU
cores including 3x 3TB
RAM nodes
• 1.7PB Ceph/site and
500TB of GPFS
• Runs OpenStack
• Also enables data
sharing with researchers
PH CLIMB
• 1280 vCPU cores
• 1PB Object and 80TB SSD
block/posix storage running
Ceph
• Runs OpenStack and
Kubernetes
In-lab cluster
• 160 CPU core cluster
• ~300TB of medium tier
NFS storage
• 24TB of SSD storage
running Ceph
Data, Software
Enabling personalised medicine - HIV
• HIV is an RNA virus that attacks the immune
system, eventually if untreated, causing AIDS
• HIV spread widely in the 1980’s and attracted
a lot of hysteria
• Today, HIV can be controlled through
medication
• The medication that a patient takes targets key
proteins encoded by the virus
• For an individual patient, ensuring that they
are on the right drug is critical to ensure that
their HIV is controlled
• The mutations that create resistance to drugs
are encoded in the HIV virus genome
HIV resistance typing using genomcs
• Our system enables a fully
automated process
• After DNA has been loaded the
next time the lab team sees
anything, it is the reports and QC
data
• Has underpinned an improvement
in turnaround time from average
2 weeks to an average of 6 days
• Has halved the cost per sample
• Has added in integrase testing at
no extra cost, increasing the
amount of clinically actionable
information provided
• Used for all HIV patients in Wales
Fighting outbreaks in hospitals – Clostridioides difficile
• C. difficile is a key pathogen that
causes disease in hospitals
• Sometimes described as a
‘superbug’, it is often associated
with antibiotic use
• ~220,000 cases and almost
13,000 deaths in the US in 2017
from C. difficile
• It spreads easily, and can survive
in the vacuum of space – so is
hard to get rid of in hospitals
• Until now, tools for tracking have
been low-resolution, so don’t
allow for effective infection
control
Outbreak tracking and data integration with C. difficile
• Since 2017 we have been working to produce a service for C. difficile outbreak support
• Now in full parallel running
• Clustering genome sequences and linking this to other data enables patient-level
examination of causes and prevention measures
Tracking pathogens across healthcare systems
We can move beyond single patients to
look at clusters of disease
In North Wales, for example, 11/18
clusters crossed hospitals; the reason
for these large clusters is under
investigation
Explains why reducing C. difficile rates
is so hard – normally we only look
within hospitals
Genomics gives us a nationwide view
National and international surveillance of Influenza
• Not much we can do in terms of treatment – the key
for influenza is prevention
• Prevention is achieved through vaccination, so we
want to
– Get the vaccine right
– Undertake campaigns to promote vaccination as requited
Proposal for a pilot study to Welsh government in October 2017
• Real time surveillance helps with both of these
• We were asked to build a service for Influenza
sequencing and surveillance in October 2017
• Went from a standing start to production clinical
service in less than 12 months with a team of 2
• Now have one of the fastest turnaround times in the
world for routine Influenza surveillance
• That was all underpinned by a system that has been
engineered for automation and reproducibility from
the ground up
Global collaboration for improved global health
• Lots of our work to date has been about
meeting local needs
• Two big things coming internationally that we
will be feeding in to
– First is a new system (called SP3) which will
provide a cloud agnostic unified pathogen
pipeline platform for use in clinical settings
• Will provide a system for the global community,
combining enterprise-grade software, documented
best practice and validation/verification datasets
– Second is the Public Health Alliance for Genomic
Epidemiology
• A new initiative to build standards and best practice
around microbial bioinformatics in public health
• These are larger open projects with significant
buy-in from national and international
organisations that will help spread the adoption
of genomics in public health
Summary
• Building clinical genomics services pose big
challenges in terms of infrastructure
• The system we have built, that integrates
HPC and on premise cloud has enabled us to
– Build world-leading clinical genomics services
– Analyse over 8,000 sequenced patient samples
in the last 12 months
– Support more than 10 distinct analysis
pipelines
– Track outbreaks across multiple species
– Provide better, faster diagnostics for multiple
patient groups in Wales
• The work we have done is now feeding into
international efforts to define and build upon
best practice in this area, in which
HPC/Infrastructure is a key component
0 SNPs
1 SNP Source
Cluster
of
Patient
Samples
Acknowledgements
Other PHW Colleagues
Dr Matt Backx
Dr Catherine Moore
Trefor Morris
Michael Perry
Dr Harriet Hughes
Dr Noel Craine
Dr Simon Cottrell
Helen Adams
David Heyburn
Fatima Downing
Sue Edwards
Cardiff University
Dr Matt Bull
Dr Anna Price
The ARCCA team
CU IT Networking
CLIMB Collaborators
Professor Nick Loman
Simon Thompson
Radoslaw Poplawski
Simon Ellwood-Thompson
PHW Pathogen Genomics
Dr Sally Corden
Joanne Watkins
Lee Graham
Alec Birchley
Bree Wilcox
Jason Coombes
Lauren Gilbert
PHW Genomics
Workstreams (ARGENT,
DIGEST, WCM, HIV,
Influenza)

More Related Content

What's hot

Swaziland Case Study
Swaziland Case StudySwaziland Case Study
Swaziland Case Study
SystemOne
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
Dana Caulder
 
From Digitally Enabled Genomic Medicine to Personalized Healthcare
From Digitally Enabled Genomic Medicineto Personalized HealthcareFrom Digitally Enabled Genomic Medicineto Personalized Healthcare
From Digitally Enabled Genomic Medicine to Personalized Healthcare
Larry Smarr
 
Classifying lymphoma and tuberculosis case reports using machine learning alg...
Classifying lymphoma and tuberculosis case reports using machine learning alg...Classifying lymphoma and tuberculosis case reports using machine learning alg...
Classifying lymphoma and tuberculosis case reports using machine learning alg...
journalBEEI
 
Microsoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceMicrosoft genomics to advance clinical science
Microsoft genomics to advance clinical science
Bruno Denys
 
Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618
SystemOne
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
Kokulapalan Wimalanathan
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
c.titus.brown
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatment
Premadarshini Sai
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
Brittany Lasseigne, Ph.D.
 
Webinar: Turning Molecules into Medicines
Webinar: Turning Molecules into MedicinesWebinar: Turning Molecules into Medicines
Webinar: Turning Molecules into Medicines
Medicines Discovery Catapult
 
Ai and biology
Ai and biologyAi and biology
Ai and biology
DhirendraKumarChauha1
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Ola Spjuth
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
Chakradhar Reddy
 
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
SystemOne
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
D3 Consutling
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health system
Private Healthcare Australia
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
Pranavathiyani G
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
Warren Kibbe
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental design
jelena121
 

What's hot (20)

Swaziland Case Study
Swaziland Case StudySwaziland Case Study
Swaziland Case Study
 
Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015Caulder - DIVOS BioITWorld 2015
Caulder - DIVOS BioITWorld 2015
 
From Digitally Enabled Genomic Medicine to Personalized Healthcare
From Digitally Enabled Genomic Medicineto Personalized HealthcareFrom Digitally Enabled Genomic Medicineto Personalized Healthcare
From Digitally Enabled Genomic Medicine to Personalized Healthcare
 
Classifying lymphoma and tuberculosis case reports using machine learning alg...
Classifying lymphoma and tuberculosis case reports using machine learning alg...Classifying lymphoma and tuberculosis case reports using machine learning alg...
Classifying lymphoma and tuberculosis case reports using machine learning alg...
 
Microsoft genomics to advance clinical science
Microsoft genomics to advance clinical scienceMicrosoft genomics to advance clinical science
Microsoft genomics to advance clinical science
 
Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618Gx alert casestudy bangladesh_062618
Gx alert casestudy bangladesh_062618
 
Bioinformatics in medicine
Bioinformatics in medicineBioinformatics in medicine
Bioinformatics in medicine
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatment
 
An Introduction to Biology with Computers
An Introduction to Biology with ComputersAn Introduction to Biology with Computers
An Introduction to Biology with Computers
 
Webinar: Turning Molecules into Medicines
Webinar: Turning Molecules into MedicinesWebinar: Turning Molecules into Medicines
Webinar: Turning Molecules into Medicines
 
Ai and biology
Ai and biologyAi and biology
Ai and biology
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...Feasibility of an SMS intervention to deliver tuberculosis testing results in...
Feasibility of an SMS intervention to deliver tuberculosis testing results in...
 
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew LittHealthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
Healthcare Conference 2013 : Genes, Clouds and Cancer - dr. Andrew Litt
 
Genomics: The coming challenge to the health system
Genomics: The coming challenge to the health systemGenomics: The coming challenge to the health system
Genomics: The coming challenge to the health system
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
 
Making your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental designMaking your science powerful : an introduction to NGS experimental design
Making your science powerful : an introduction to NGS experimental design
 

Similar to Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomics in the NHS in Wales

Implementing Pathogen Genomics
Implementing Pathogen GenomicsImplementing Pathogen Genomics
Implementing Pathogen Genomics
Tom Connor
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
Warren Kibbe
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
Warren Kibbe
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Barry Smith
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Philip Bourne
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
William Hsiao
 
Become a Medicines Discovery Catapult Partner - Nottingham
Become a Medicines Discovery Catapult Partner - NottinghamBecome a Medicines Discovery Catapult Partner - Nottingham
Become a Medicines Discovery Catapult Partner - Nottingham
Medicines Discovery Catapult
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
Al Dossetter
 
16
1616
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
CityAge
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
William Hsiao
 
Hannes smarason next code-wuxi combined technologies
Hannes smarason   next code-wuxi combined technologiesHannes smarason   next code-wuxi combined technologies
Hannes smarason next code-wuxi combined technologies
Hannes Smárason
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
Nour Shublaq
 
Key Technologies for Genome Analysis
Key Technologies for Genome AnalysisKey Technologies for Genome Analysis
Key Technologies for Genome Analysis
Hannes Smárason
 
Accelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideAccelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwide
Joaquin Dopazo
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
BIT002
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem
Alain van Gool
 
Accelerating the translation of medical research - 27 June
Accelerating the translation of medical research - 27 JuneAccelerating the translation of medical research - 27 June
Accelerating the translation of medical research - 27 June
Innovation Agency
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
Jayati Shrivastava
 

Similar to Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomics in the NHS in Wales (20)

Implementing Pathogen Genomics
Implementing Pathogen GenomicsImplementing Pathogen Genomics
Implementing Pathogen Genomics
 
Clinical Genomics and Medicine
Clinical Genomics and MedicineClinical Genomics and Medicine
Clinical Genomics and Medicine
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Become a Medicines Discovery Catapult Partner - Nottingham
Become a Medicines Discovery Catapult Partner - NottinghamBecome a Medicines Discovery Catapult Partner - Nottingham
Become a Medicines Discovery Catapult Partner - Nottingham
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
16
1616
16
 
My Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad PopovichMy Personal Odyssey with Big Data - Brad Popovich
My Personal Odyssey with Big Data - Brad Popovich
 
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
IRIDA: A Federated Bioinformatics Platform Enabling Richer Genomic Epidemiolo...
 
Hannes smarason next code-wuxi combined technologies
Hannes smarason   next code-wuxi combined technologiesHannes smarason   next code-wuxi combined technologies
Hannes smarason next code-wuxi combined technologies
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
 
Key Technologies for Genome Analysis
Key Technologies for Genome AnalysisKey Technologies for Genome Analysis
Key Technologies for Genome Analysis
 
Accelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwideAccelerating the benefits of genomics worldwide
Accelerating the benefits of genomics worldwide
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem
 
Accelerating the translation of medical research - 27 June
Accelerating the translation of medical research - 27 JuneAccelerating the translation of medical research - 27 June
Accelerating the translation of medical research - 27 June
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 

Recently uploaded

CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
rishi2789
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
Gokuldas Hospital
 
Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
AyeshaZaid1
 
Travel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International TravelersTravel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International Travelers
NX Healthcare
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
Jim Jacob Roy
 
Skin Diseases That Happen During Summer.
 Skin Diseases That Happen During Summer. Skin Diseases That Happen During Summer.
Skin Diseases That Happen During Summer.
Gokuldas Hospital
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
rishi2789
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
LEFLOT Jean-Louis
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
Jim Jacob Roy
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
phuakl
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
Kanhu Charan
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
FFragrant
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
MedicoseAcademics
 
10 Benefits an EPCR Software should Bring to EMS Organizations
10 Benefits an EPCR Software should Bring to EMS Organizations   10 Benefits an EPCR Software should Bring to EMS Organizations
10 Benefits an EPCR Software should Bring to EMS Organizations
Traumasoft LLC
 
How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.
Gokuldas Hospital
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
SIVAVINAYAKPK
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
rishi2789
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
Health Advances
 

Recently uploaded (20)

CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdfCHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
CHEMOTHERAPY_RDP_CHAPTER 3_ANTIFUNGAL AGENT.pdf
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
 
Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
 
Travel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International TravelersTravel Clinic Cardiff: Health Advice for International Travelers
Travel Clinic Cardiff: Health Advice for International Travelers
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
 
Skin Diseases That Happen During Summer.
 Skin Diseases That Happen During Summer. Skin Diseases That Happen During Summer.
Skin Diseases That Happen During Summer.
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
 
10 Benefits an EPCR Software should Bring to EMS Organizations
10 Benefits an EPCR Software should Bring to EMS Organizations   10 Benefits an EPCR Software should Bring to EMS Organizations
10 Benefits an EPCR Software should Bring to EMS Organizations
 
How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.How to choose the best dermatologists in Indore.
How to choose the best dermatologists in Indore.
 
Nano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory projectNano-gold for Cancer Therapy chemistry investigatory project
Nano-gold for Cancer Therapy chemistry investigatory project
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
 
Cell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune DiseaseCell Therapy Expansion and Challenges in Autoimmune Disease
Cell Therapy Expansion and Challenges in Autoimmune Disease
 

Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomics in the NHS in Wales

  • 1. Beating Bugs with Big Data: Harnessing HPC to Realize the Potential of Genomics in the NHS in Wales Dr Tom Connor Bioinformatics Lead, Public Health Wales, Reader, Cardiff University, Group Leader, Quadram Institute Bioscience Supported by
  • 2. Funding Sources MRC CLIMB Welsh Government Public Health Wales NHS Trust Microbes in the Food Chain
  • 3. Acknowledgements Other PHW Colleagues Dr Matt Backx Dr Catherine Moore Trefor Morris Michael Perry Dr Harriet Hughes Dr Noel Craine Dr Simon Cottrell Helen Adams David Heyburn Fatima Downing Sue Edwards Cardiff University Dr Matt Bull Dr Anna Price The ARCCA team CU IT Networking CLIMB Collaborators Professor Nick Loman Simon Thompson Radoslaw Poplawski Simon Ellwood-Thompson PHW Pathogen Genomics Dr Sally Corden Joanne Watkins Lee Graham Alec Birchley Bree Wilcox Jason Coombes Lauren Gilbert PHW Genomics Workstreams (ARGENT, DIGEST, WCM, HIV, Influenza)
  • 4. You are here Wales is here Basic Orientation
  • 5. The NHS in Wales • National Health Service, universal healthcare, free at the point of use • The NHS is a devolved area • Public Health Wales is one of the 11 organisations that makes up the NHS in Wales • Has a wide remit including: • Runs the microbiological diagnostic labs for most of Wales’ hospitals and GPs • Public health and surveillance • PHW offers fully national services for the whole 3.25 million population of Wales
  • 6. Why does microbiology matter? • Infectious disease accounts for ~7% of all deaths in the UK and over £30bn of economic costs per year • Antibiotic resistance alone has a projected cumulative global cost of $100 Trillion by 2050 • We need better tools and systems to prevent, diagnose, track and treat infectious disease
  • 7. Quick biology lesson; introducing microbial genomes A bacterial cell The bacterial chromosome for this bacterial cell – its blueprint That blueprint includes the plans for proteins (“genes”) and other information relating to how and when these need to be made This genetic information is the organisms genome
  • 8. So, what is Genomics? Sequencing instruments read DNA, and by extension, enable us to read the genome of an organism of interest However, what really defines the field is the use of sequencing instruments Put simply, genomics is the branch of biology that is concerned with studying genomes From that we can make predictions about the organism of interest, or compare it with others
  • 9. That means genomics has a lot of potential in healthcare Faster results More clinically actionable information per test Clearer, better diagnostics - Genomics enables medicine to become more personalized Simultaneously - Genomics also gives us unprecedented tools to track and combat outbreaks/epidemics on a population level Data is digital and has many uses
  • 10. Why now? The reduction in the raw cost of sequencing the human genome 20192003 * Humans are boring. For the same money we can sequence ~50 bacterial genomes
  • 11. Genomics Partnership Wales • In July 2017 the Welsh Government launched the Genomics for Precision Medicine Strategy, with over £10M spent so far • Covers both human and pathogen genomics • PHW is leading the Pathogen Genomics elements, with a Pathogen Genomics Unit launched in 2018 • Current development areas – AMR bacterial surveillance and characterisation – Cystic Fibrosis polymicrobial infection diagnostics – Enterovirus surveillance • Production systems – C. difficile surveillance and outbreak support – Mycobacteria identification and characterisation – Influenza surveillance – HIV susceptibility testing Accreditation in process System design/needs assessment in progress Pilot system development
  • 12. The (clinical) sequencing process has 5 main elements 1. Sample to Sequencer 2. Sequencing 3. Reads to reports 4. Interpretation 5. Fixing it when it breaks Labwork Data science/Bioinformatics
  • 13. However, the major costs and difficulties do not lie with the generation of data, they lie with how we share, store and analyse the data we generate Bioinformatics expertise User accessibility of software/hardware Appropriate compute capacity Software development Storage availability Network capacity Sequencing is now relatively cheap and easy ; we can sequence large numbers of strains for modest amounts of money These account for up to 90% of the costs of doing genomics work Our problem: The sequencing iceberg
  • 14. Wave 1 Wave 2 Wave 3 2005- 09 1989- 97 2003- 07 1992- 2002 1993-98 1975-86 1937-611966-71 1967-89 1969-73 1969-81 1981-85 1974 1986-87 1969-73 Mutreja, Kim, Thomson, Connor et al, Nature, 2011 Pretty picture
  • 15. 320 samples Approx 6- 700GB uncompressed data Sequence Assembly Each job 4-8GB RAM 1 CPU core Each job generates intermediate files of ~6GB Runtime: 1+ hours/job Sequence mapping 320 jobs Each job 4GB RAM 1 CPU core Each job generates intermediate files of ~3GB Runtime: 1 hour/job Phylogenomics 1 job, 1+ cores, up to 128GB RAM Intermediate file size ~2+GB Output file ~2GB Runtime 1-2 days Virulence and antimicrobial resistance screening 320 jobs, single core 100MB ram Runtime: 5 mins/job Generates 10-20 small files per job Bayesian modelling 3 jobs, 1 core+, up to 1 GB RAM CPU intensive Runtime: 2 days per job Output file ~10GB Can use GPUs Written in Java Larger RAM HTC HTC HPC Possibly HPC Not so pretty workload
  • 16. @M04531:39:000000000-AVVU9:1:1101:12907:2147 1:N:0:52 CCCGATGTTGCGCACGCCCTGGTTGGCGAAACCATAGTTGGCGCTGCCGGCATTGCCGAACCCGACGTTGAAGTTGCCCACATCCGCCAACCCGATG TTGAGGATGGGGATCTGGTTCAACGCGGTCCCGGCCGCAGACACGCCCGACAGCTGATGGCCGACGTTGCCGAGGCCCGACAGCACCGCCGGCGT CCCGAGCGGCAACACGCTGGTGTTGTAGATCCCCGAGACACCCGAGCCGACGTTGAGCA + BAAAABBFFFFBGGGGGGGGGGGGDGHGGGGGHHHHHHHHHGGGGGGHGCCGCFHHHG?EGGGG?EGEFHHHDGGHHHHHGGHHHGGGGGG GGGGCDCGB1FHHGHGHGEDGHHGHHHHHHGGFGGGGHFGGGGGADGFGGGGEFGGGFFFFFFFFBFFFFFFFFFFFFC=FAFFFFFFFFFFFFFFFFFF FFAFF.BCBBDAFFFFFFFFFFBADEFFFFFFFFFFFFAABFF?9>D;DA->B=EE.FF/ @M04531:39:000000000-AVVU9:1:1101:20177:2174 1:N:0:52 GTTTTCGTCGCGATCGCCCACGAGACCGAGGCTGATCTCGTATGCCGTCTTCTGCTTGAAACAAAAAAGCCGTACCCACCATTCGCACCCGCGTACTC ACCACTCTTACCGTCCTACTCACCCTTTGCTTTTTGCCCCGGCTTTATTCCTGTCGACGCTACTCCTTCCCCCCCCGCTGCGTGCTCTCCATCCCCCTCC TCCGAGACCTGCTTTACTTCCGGGCCTTGTTCGCTGTCTTCTCCGCCTCTCTC + AAA?AF@FBDDDAEGEEEECCEAECE2AEFGGHA1AFGFFF?GHFH0EEHGGDDGHFH1333B?111/1B0/?//?F3/?/3?44//////<</>//1??G1/?01 1110.<...11<>1<C..<00=0<00<-<<0.:-;C-/000;0009/..-9;..-0;/000/9/.-;;;@--;-...9/////;;A.9.-A..9/99--..9.99////////.---9./9//;.- ..;//9////.....9/// @M04531:39:000000000-AVVU9:1:1101:17824:2565 1:N:0:52 TTCTAATACTGTATCATCTGCTCCTGTGTCTAATAGGGCTTCTATCAGCCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAGGCATGATCTCGTATG CCGTCTTCTGCTTGAAAAAAAAACTAAATAAACTGTTCACCATAAACGTGACTATTATGGCCATAAATAGTAACAATGTCTAGAGTAGATTCAATATGAAG ACCGCTACATAATAGATAAATACGTCGTACGTCAACTTCGACAATCTGG + BBBBBFFFFFFFGGGGGGGGGGHGHHHHHHHHHHHHFHHHHHHHHBGFHHHHFHHHHHHHHHFHHHFHDHGEGCGCFFGGGCGHHHHGHGHHH FFHHFGHHGHGGGHFHGHHGHGHCGFH@EEC/232B33332@2@>2>22@2<D22/0/??<<111111/?1<F111111>11<11111<00=0=DF00 0=D000000000//---::009;;000;000000.0.9-..-/...9900....-/;;/: Why the complexity? Next generation sequencing involves taking many copies of a genome and smashing it up into millions of fragments that can be sequenced simultaneously using a sequencing instrument We then have to put this back together Think of it as a huge, complex jigsaw puzzle, with most pieces being sky, and a number of pieces being from other puzzles And, unlike in human genomics, we often don’t know what the original ‘puzzle’ is meant to look like Once we have rebuilt the puzzle, we will have a range of questions we want to ask These questions require different approaches to find the answers we need And microbial bioinformatics best practice doesn’t exist yet
  • 17. Our challenges • We need to be able to rapidly process patient samples • We need to be able to deal with large volumes of complex data • Need to be able to compare new samples with old • Need to have locked down pipelines that don’t change and can work anywhere • Need to be able to deal with very mixed workloads • Need the ability to scale our analyses • Need to get around issues with physical infrastructure (e.g. hospital networks) • Need to be able to build upon what researchers have done, but in a more enterprise way • Need to do this in an environment (the NHS) which only understands Windows BIOINFORMATICS
  • 18. MRC CLIMB • 4 sites (1 in Cardiff) each with ~1920 vCPU cores including 3x 3TB RAM nodes • 1.7PB Ceph/site and 500TB of GPFS • Runs OpenStack • Also enables data sharing with researchers PH CLIMB • 1280 vCPU cores • 1PB Object and 80TB SSD block/posix storage running Ceph • Runs OpenStack and Kubernetes In-lab cluster • 160 CPU core cluster • ~300TB of medium tier NFS storage • 24TB of SSD storage running Ceph Data, Software
  • 19. Enabling personalised medicine - HIV • HIV is an RNA virus that attacks the immune system, eventually if untreated, causing AIDS • HIV spread widely in the 1980’s and attracted a lot of hysteria • Today, HIV can be controlled through medication • The medication that a patient takes targets key proteins encoded by the virus • For an individual patient, ensuring that they are on the right drug is critical to ensure that their HIV is controlled • The mutations that create resistance to drugs are encoded in the HIV virus genome
  • 20. HIV resistance typing using genomcs • Our system enables a fully automated process • After DNA has been loaded the next time the lab team sees anything, it is the reports and QC data • Has underpinned an improvement in turnaround time from average 2 weeks to an average of 6 days • Has halved the cost per sample • Has added in integrase testing at no extra cost, increasing the amount of clinically actionable information provided • Used for all HIV patients in Wales
  • 21. Fighting outbreaks in hospitals – Clostridioides difficile • C. difficile is a key pathogen that causes disease in hospitals • Sometimes described as a ‘superbug’, it is often associated with antibiotic use • ~220,000 cases and almost 13,000 deaths in the US in 2017 from C. difficile • It spreads easily, and can survive in the vacuum of space – so is hard to get rid of in hospitals • Until now, tools for tracking have been low-resolution, so don’t allow for effective infection control
  • 22. Outbreak tracking and data integration with C. difficile • Since 2017 we have been working to produce a service for C. difficile outbreak support • Now in full parallel running • Clustering genome sequences and linking this to other data enables patient-level examination of causes and prevention measures
  • 23. Tracking pathogens across healthcare systems We can move beyond single patients to look at clusters of disease In North Wales, for example, 11/18 clusters crossed hospitals; the reason for these large clusters is under investigation Explains why reducing C. difficile rates is so hard – normally we only look within hospitals Genomics gives us a nationwide view
  • 24. National and international surveillance of Influenza • Not much we can do in terms of treatment – the key for influenza is prevention • Prevention is achieved through vaccination, so we want to – Get the vaccine right – Undertake campaigns to promote vaccination as requited Proposal for a pilot study to Welsh government in October 2017 • Real time surveillance helps with both of these • We were asked to build a service for Influenza sequencing and surveillance in October 2017 • Went from a standing start to production clinical service in less than 12 months with a team of 2 • Now have one of the fastest turnaround times in the world for routine Influenza surveillance • That was all underpinned by a system that has been engineered for automation and reproducibility from the ground up
  • 25. Global collaboration for improved global health • Lots of our work to date has been about meeting local needs • Two big things coming internationally that we will be feeding in to – First is a new system (called SP3) which will provide a cloud agnostic unified pathogen pipeline platform for use in clinical settings • Will provide a system for the global community, combining enterprise-grade software, documented best practice and validation/verification datasets – Second is the Public Health Alliance for Genomic Epidemiology • A new initiative to build standards and best practice around microbial bioinformatics in public health • These are larger open projects with significant buy-in from national and international organisations that will help spread the adoption of genomics in public health
  • 26. Summary • Building clinical genomics services pose big challenges in terms of infrastructure • The system we have built, that integrates HPC and on premise cloud has enabled us to – Build world-leading clinical genomics services – Analyse over 8,000 sequenced patient samples in the last 12 months – Support more than 10 distinct analysis pipelines – Track outbreaks across multiple species – Provide better, faster diagnostics for multiple patient groups in Wales • The work we have done is now feeding into international efforts to define and build upon best practice in this area, in which HPC/Infrastructure is a key component 0 SNPs 1 SNP Source Cluster of Patient Samples
  • 27. Acknowledgements Other PHW Colleagues Dr Matt Backx Dr Catherine Moore Trefor Morris Michael Perry Dr Harriet Hughes Dr Noel Craine Dr Simon Cottrell Helen Adams David Heyburn Fatima Downing Sue Edwards Cardiff University Dr Matt Bull Dr Anna Price The ARCCA team CU IT Networking CLIMB Collaborators Professor Nick Loman Simon Thompson Radoslaw Poplawski Simon Ellwood-Thompson PHW Pathogen Genomics Dr Sally Corden Joanne Watkins Lee Graham Alec Birchley Bree Wilcox Jason Coombes Lauren Gilbert PHW Genomics Workstreams (ARGENT, DIGEST, WCM, HIV, Influenza)