From Jisc's campus network engineering for data-intensive science workshop on 19 October 2016.
https://www.jisc.ac.uk/events/campus-network-engineering-for-data-intensive-science-workshop-19-oct-2016
How to Troubleshoot Apps for the Modern Connected Worker
Electron Microscopy Between OPIC, Oxford and eBIC
1. Robert Esnouf, Campus network engineering workshop
19/10/2016 Electron Microscopy Between OPIC, Oxford and eBIC
2. Electron Microscopy Between
OPIC, Oxford and eBIC, Harwell
Robert Esnouf (robert@well.ox.ac.uk),
Head of Research Computing Core,
Wellcome Trust Centre for Human Genetics,
Old Road Campus,
University of Oxford
Campus Network Engineering for Data-Intensive Science,
London, 19 October 2016
3. Overview of talk
The Wellcome Trust Centre for Human
Genetics: science & facilities
Why is electron microscopy such hot
science
OPIC and eBIC
Networking challenges of OPIC/eBIC
model
4. The Old Road Campus, University of Oxford
One of Europe’s largest biomedical
research campuses
◦ In east Oxford near John Radcliffe, Churchill,
Nuffield Orthopaedic & Warneford Hospitals
◦ First building (HWBGM) opened 1999
◦ Already ~2000
researchers, space
to double…
5. The Wellcome Trust Centre for Human
Genetics
A department of the University of Oxford
◦ Founded in 1994 with core support from the
WT
◦ Moved to new building in 1999 – Henry
Wellcome Building for Genomic Medicine on
the Old Road Campus
6. The Wellcome Trust Centre for Human
Genetics
About 500 researchers
◦ “to advance the understanding of genetically-
related conditions through multi-disciplinary
research”
◦ Sequencing, statistical genetics, disease-
focused research (diabetes, obesity, heart
disease, malaria), optical microscopy, MRI,
functional genetics, crystallography & electron
microscopy (STRUBI & OPIC)
7. WTCHG Research Computing Core
ResComp Core is squeezed in a tiny room
◦ 4120 compute cores, 4.2PB raw GPFS storage,
3.9PB other (archive) storage; FDR InfiniBand
◦ 2.2 FTE to manage (me, Jon, and 20% Colin
Freeman)
In 2015, the ResComp Core delivered:
◦ Compute to 303 users (150 active) from 32
groups
◦ 2,640 cores of the main cluster delivered 55.5
billion seconds (1,761 years) of CPU time
◦ 27 different users from 12 different research
groups each used >20 years of CPU time
9. Sequencing facilities
WTCHG Oxford Genomics Centre
◦ Illumina systems: HiSeq 2500, HiSeq 4000,
MiSeq
◦ IonTorrent, genotyping (Solexa?)
◦ Evaluating long-read technologies (have
MinION)
Mixture of WGS, exome sequencing,
RNA-seq, single-cell work, custom
sequencing
Approximately 1000 genomes per year
2PB base call files 100TB BAMs
10. ONT MinION
ONT PromethION
Other systems (PacBio)
Long-read sequencing technologies
11. Processing ONT long-read sequencing
Processing from MinION readers
◦ Each pore produces file of 100-100,000 base calls.
◦ Many small files produced, modest data volume
What about PromethION (up to 48 flow cells)?
◦ Up to 80GB/hour 3.8TB in 2 days/flow cell
◦ Average 200kB files 400,000 files/hour/flow cell
Frightening headline numbers
Each 2-day run could generate:
182.4TB of FAST5 files @
8.5Gbit/s
921.6 million 100kB-1MB files
Require 960 cores to process
13. Oxford Particle Imaging Centre (OPIC)
An EM facility unique in Europe
◦ Biosafety containment suite
(ACDP3/DEFRA4)
◦ FEI Tecnai Polara EM with Gatan K2 detector
◦ Can be accessed by UK researchers (20%
eBIC)
Second EM in normal
containment
400fps movies “flattened”
to images (>1TB/day)
14. Single particle structures by EM (relion)
Images are translucent like hospital X-
rays
Computation & memory intense process
◦ Correct for drift and shake, detect particles
◦ Extract particle images (2TB 50-100GB)
◦ 2D classification of particle projections
◦ 3D classification of particle
◦ 3D refinement of structural model(s)
Particle covered by a pixel box
◦ Cubic dependence of memory on box size
◦ 400-box (picornavirus) requires 300GB
15.
16. Net result - example from FMDV EM &
X-ray structures
(12h data collection for EM)
X-ray 2.6Å
EM 3.3Å
18. 18
eBIC: the Electron Bio-Imaging Centre
Diamond Light Source, Harwell Science and Innovation
Campus
19. Data collection statistics
Number of unique groups Allocated time
*14 groups from Cambridge, 10 groups from Oxford, 6 groups from Birkbeck, 5 groups from Imperial, 4 groups from Manchester and Bilbao, 3 groups from Leeds,
2 groups from Edinburgh, the Crick and Dundee, 1 Group from Diamond, Warwick, Madrid, Bristol, Leicester, Helsinki, Sheffield, Stockholm, Virginia and SPring8
20. Industrialization of EM
(install on syncrotron hall floor)
Installation: 8/5-5/6 (2015)
External users: Monday 29/6
Publications in Cell & Nat.
Comms within first ½ year
Heavy oversubscription and v.
high industrial interest
The democratization of cryo-EM –
Nat Methods,2016, Stuart,
Subramaniam, Abrescia 20
21. The eBIC ‘hall’ in the new building to accommodate 4 Krios
microscopes
In addition to Krios I and II, Talos and Scios machines are now
operational
Two further Krios microscopes ordered, installation 2016 and 2017.
Recruitment is underway – challenging in the present EM feeding
frenzy
22. Time critical, first-pass processing
Microscopes are expensive
◦ ~£2-5M to buy + £1M p.a. to run
Immense shortage of expert staff time
Not all samples give good images
◦ Bad samples, problems with optics
◦ Need to detect quickly
◦ First-pass processing is a CTF
◦ Process ~50GB images, results back in 30s
Collected in Oxford, processed in Harwell!
23. New fibres within WTCHG
16FOM4
16FOM4
16F OM4
16F OM4
8FOM4
8FOM4
IT2 CommsIT1 Comms
OPIC
Comms
Containment Lab.
Open
Lab.
STRUBI Servers
8FOM3
16FOM3
Cluster
Room
24. New network within WTCHG
2xMM10g
2xMM10g
2x MM 10g
Each microscope 1x MM 10g
1xMM10g
Firewall
1xMM10g
Universit
y
network
25. The Oxford University Network
CORCCMUS
CINDCROQ
40Gbit/s
links
WTCHG
FroDo
TVN PoPs
40Gbit/s links:
10Gbit/s to .well.ox
10Gbit/s to .strubi.ox
10Gbit/s
links
10Gbit/s
links
26. Thanks to...
The WTCHG ResComp Core
◦ Jon Diprose and Colin Freeman’s left leg
STRUBI and OPIC Microscopy Staff
◦ esp. Juha Huiskonen and Abhay Kotecha
Staff at Diamond Light Source and eBIC
◦ David Stuart, Alistair Siebert, …
All of you for your attention!