Species ID
Align MinION reads to taxa
specific genes (MetaPhlA2)
using LAST.
S.enterica identified in under
30 mins
Serogroup ID
Align minION reads to
S.enterica reference genome.
Phylogenetic placement with
pplacer
Within 50 mins we could
identify the strains as serotype
Enteritidis
Outbreak ID
Alignment of MinION reads to
Salmonella Enteritidis reference
genome. Phylogenetic placement
with pplacer
Within 100 minutes the outbreak
strains was unambiguously part of
the national 14b cluster (RED) and
the sporadic cases (BLUE) was
indeed sporadic
Quick et al – Genome Biology in press
Dstl Protocol Development
• Two approaches tried on archived Ebola RNA:
– Genome tiling amplicon: RT-PCR
– Direct metagenomics: Ebola-spiked sample
• Validation with MiSeq
So, what happened in April?
May Jun Aug
Sep
July
Oct Nov Dec
Jan Feb AprMar
Wish List
What now?
• Fast mode / more pores
– Diagnostic & environmental
metagenomics
• Rapid sample prep
• (very) Low input
• Direct RNA sequencing
MinION is unique:
• Real-time
• Portable
• Super long reads
Acknowledgements
Oxford Nanopore
• Clive Brown
• Zoe Macdougall
• Daniel Turner
• Stephanie Brooking
• Oliver Hartwell
• Gordon Sanghera
• Spike Willcocks
• Roger Pettett
• Stodders
• Apollo Steve
• Bhupinder
• ..and the entire team
Dstl Porton Down
• Simon Weller
• Phil Rachwal
• Jamie Taylor
Heartlands Hospital
• Peter Hawkey
Public Health England
• Phil Ashton
• Tim Dallman
GigaScience
• Scott Edmunds
University of Birmingham
• Cathy Wardius
• Szymon Calus
Josh Quick
Editor's Notes
For the first month or two we had the MinION – and the test flow cell. The flow cell test script was fun to play with for the first couple of hundred runs but eventually got a bit boring. Lots of people kept coming to the office to look at it and asked what we had done.
Whilst we were waiting for reagents we went for breakfast in a hipster café – this is the soap they have in the washrooms.
Things go real in June.
A confession about our lambda burn-in – we raced through the protocol and as soon as we got a mapping read of about 3kb we ticked the box saying the MinION was fit for our purposes! I don’t think we looked any more at lambda and moved onto bacterial samples
The first sample we tried was a Pseudomonas aeruginosa. This was R6 chemistry. Back then we didn’t really get any 2D reads to speak of. We put this run on in the evening and went home. I think it was a Friday. I got up early on the Saturday and started screening the longest, juiciest looking reads with BLAST. Most of them didn’t map to anything. But this one; an 8kb read BLAST to something. This was a template read, and it was about 8lb.
What was extraordinary about this read was that it mapped across the whole of the O-antigen determining locus of Pseudomonas aeruginosa. So there are two ways of looking at this read, depending on your personality: 1) is that it’s only 68% identity and it’s completely useless. The way I look at this is that this single read, generated by a portable genome sequencer, has in real-time entirely replaced a complex laboratory protocol of serotyping, the protocol of which I list below. Remember this protocol needs to be performed on every single serotype you want to test. This is an O6 Pseudomonas.
The obvious thing to do next was to take the MinION home. In this case we made the library in the lab, and took the sequencer home. You can see the chilled library in the fridge, also some yoghurt, mustard and rice pudding.
We needed some tools to analyse the data and use it. This became poretools. One thing that’s been fun about the MAP is interacting with other great bioinformaticians. In this case I had some fairly ropey Python scripts to do basic manipulation of FAST5. Aaron Quinlan is the author of bedtools and he had some much more principled scripts. So we decided to team up on poretools.
Poretools is the way we do the initial data processing. It can do basic interrogation of FAST5 files and visualisation of runs. It will run on the MinKNOW laptop. At the moment it uses Rpy which has been a bit of a pig to install for people, so we’ve decided to dump that and move to a web-based toolkit. Aaron has recently been building in functionality to let poretools read streams of reads rather than finished run directories.
So: we wanted to use the MinION for something useful. Luckily/unluckily in June and July there was the start of a major outbreak. Right now we are transitioning to WGS for looking at outbreaks, and this is being picked up by the regional and national public health laboratories.
In our hospital we had a large outbreak of Salmonella enterica Enteritidis, affecting 30 patients. The problem we have with an outbreak like this, which was seen across multiple wards, was; 1) is this a point source outbreak? 2) does it reflect multiple importations from the community 3) is there spread within the hospital? 4) how do we know it has been dealt with. 5) where did it come from originally? WGS helps us answer all these questions, particularly when integrated with national and international surveillance data.
We wanted to test whether the MinION could produce actionable information in an outbreak. We developed a couple of new methods to do this:
It was possible, even with R7 data with ~72% “normal” 2D accuracy to get reliable genotypic placements even with very low coverage of the genome ~3x within an hour or two.
So, we were motoring away happily, focusing on applications and then this bombshell was dropped. I felt this paper, other than being rather poorly done, represented a bit of a betrayal of the philosophy of the MAP.
Therefore we accelerated our plans to get some initial analysis out, and in September we got the R7.3 kits and were able to generate whole genome coverage for E. coli K-12, which we released in GigaDB, as a preprint and then in October in GigaScience.
Anyone that has been to the UK knows that by November, the promise of spring and the glorious long hot days of summer are but a distant memory. Winter is coming to Westeros. And as most people know, Christmas is a time when we have to spend time with the family and in-laws. Which could be analogous to scaling the North Wall. But not in my case!! <joke>. But just before Xmas I had the opportunity to work with another fantastic bioinformatician, Jared Simpson.
Jared thought that we could use the GigaScience data we had produced to do de novo assembly without reference to short read data. We adopted the PacBio HGAP system as our model. Jared found that DALIGNER from Gene Myers was very good at finding overlaps for data with about ~80% identity, and POA was good at doing initial correction. After two corrections the read data is ~98% accurate.
Feeding this data into Celera, after many weeks of twiddling with the correction and with the Celera parameters, gave us a single contig for E. coli. Jared will talk more about this later on.
Now throughout this year, the Ebola outbreak was ravaging West Africa, and it was always in my mind that we had a potential technological solution to help with the problem of generating genome data. However, there were still some doubts as to whether the instrument would be accurate enough to be useful in genotyping.
Therefore when Mark Akeson, Benedict Paten and Miten Jain published their paper about marginAlign they showed that R7.3 data was suitable for detecting SNPs with a very high precision and recall ~99% given sufficient coverage. This gave me confidence that it might work well on Ebola, where the substitution frequency is only 0.1% and the viral genome is 20kb, therefore high coverage is easily achievable.
Mick Watson and I wrote a piece for Nature Methods praising this paper.
At the same time, Nature and others were showing how poor we are doing at real-time surveillance of Ebola, with huge gaps in coverage of sequencing during the outbreak. This is unacceptable because the genome information is of great use for vaccine development, treatment development, understanding of mutation rates and evolution and we think local epidemiology.
At this point, we teamed up with Dstl Porton Down to do some initial Ebola protocol validation. On the way past Salisbury plain, I noticed something previously unseen by generations of archaeologists … Thinking it must be a good omen, I continued on.
We tried two different methods; a one step RT-PCR using tiling amplicons, and a total RNA two step process with the low-input protocol that was laborious. We decided that the PCR sequencing would be better, mainly because the total RNA didn’t produce any reads. We also sequenced the same archive strain for MiSeq for validation of the SNP calling.
Around this time there were updates to the MinION software and new chemistry, and now we were seeing accuracy just shy of 90% for PCR positive runs.
We were basically set, so the final hurdle is could be make the instrument portable and take everything in hold luggage.
Josh spent many days packing and repacking his bags. In this case the bulkiest instrument was our PCR machine. We planned to take 3 MinIONs and 3 laptops in order to do 3 samples each day.
What happened next? Find out at Josh’s talk tomorrow!
So to conclude, we have proved the MinION has unique characteristics. This is what we are looking forward to now.