In this deck from DDN User Group at SC18, Steve Hindmarsh from the Crick Institute presents: Research Computing Storage at the Francis Crick Institute.
"The Francis Crick Institute is dedicated to understanding the fundamental biology underlying health and disease. Our work is helping to understand why disease develops and to translate discoveries into new ways to prevent, diagnose and treat illnesses such as cancer, heart disease, stroke, infections and neurodegenerative diseases. An independent organisation, our founding partners are the Medical Research Council (MRC), Cancer Research UK, Wellcome, UCL, Imperial College London and King's College London. The Crick was formed in 2015, and in 2016 we moved into our brand new state-of-the-art building in central London which brings together 1500 scientists and support staff working collaboratively across disciplines, making it the biggest biomedical research facility under a single roof in Europe."
Learn more: https://www.crick.ac.uk/
and
http://ddn.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Research Computing Storage at the Francis Crick Institute
1.
2. Research Computing Storage at
The Francis Crick Institute
Steve Hindmarsh
Head of Scientific Computing
steve.hindmarsh@crick.ac.uk
Michael Holliday
Senior HPC & Research Data Systems Engineer
michael.holliday@crick.ac.uk
DDN User Group SC18
3. The Francis Crick Institute
• A biomedical discovery institute dedicated to
understanding the fundamental biology underlying
health and disease.
• Founded in 2015 with the merger of two London
research institutes from the Medical Research Council
and Cancer Research UK into the Crick
• Supported by our founding partners:
3
4. “Discovery without boundaries”
• Core research areas:
• Growth and development
• Health and ageing
• Human biology
• Cancer
• Immune system
• Infectious disease
• Neuroscience
Multi-disciplinary approach: biology, physics,
chemistry, bioinformatics, maths,
engineering…
4
5. The Crick in numbers
• 1300 Scientists across 130 Labs, Research Groups &
Science Technology Platforms
• 300 Operations Staff
• Over 30,000 pieces of Scientific Equipment
• 13 Floors, 1500 Rooms
• 2 National / International facilities
• 1 Nobel Prize winner!
5
6. Scientific instruments
• Cryo-Electron microscopes
• Electron microscopes
• Light microscopes
• DNA/RNA sequencers
• Mass spectrometers
• Nuclear Magnetic Resonance (NMR) spectroscopes
• Etc.
All produce large volumes of data which needs to be captured, stored
and analysed.
6
7. DDN at the Crick
DDN underpins our research computing platform providing:
• Scalability – currently 10 PB and filling up fast!
• Performance
• Multiple and varied workloads
• Resilience
• Manageability
So far it has stood up well to all the workloads our scientists can throw at it. New
challenges include:
• Unprecedented data growth
• Machine/Deep Learning/AI – Nvidia V100 GPU cluster coming soon
7
8.
9. CAMP – Crick Data Analysis and Management Platform –
Where did we come from:
• 4 Isilon Clusters with around 4PB of data going back 40 years
• Numerous NAS Boxes, Hard drives, USBs etc (we’re still finding out
about them now…)
• Storage from legacy institutes were managed in very different ways,
and had their own Domains and Permissions
• Data replicated in multiple locations but it wasn’t always clear
10. CAMP – Crick Data Analysis and Management Platform–
11. Where we started at the Crick:
CAMP
(3PB)
INGEST
(1PB)
GENERAL
(500TB)
Mill Hill LIF
GARLIC
HPC Cluster
(200 Nodes)
AFM
(IW)
AFM
(IW)
AFM
(RO)
AFM
(RO)
Remote Mount Remote Mount
Remote Mount
12. CAMP – Crick Data Analysis and Management Platform –
• FDR IB Fabric
• 2 Level Fabric
• CAMP
• 2 DDN GS12K Systems
• 20 Enclosures
• INGEST, General, LIF, Millhill:
• Each has 1 DDN GS7K
• 1 or 2 Enclosures
13. Problems we encountered…
• AFM
• Migration of legacy data
• Data, More Data & Even More Data….
• Instruments
• Labs adapting to the new systems
13
14. AFM
• INGEST and General were set up to be caches of CAMP storage
• At one point INGEST had 130 AFM links to CAMP
• Delays to syncing were noticeable to scientists, particularly those using the HPC
cluster
• Permissions were not syncing correctly between the two
• Cache was out of sync with home and unable to recover
• Used Independent Writer Mode
14
15. Data, More Data & Even More Data….
• Our Labs and STPs are producing more
data at faster and faster rates
• 1 Titan Cryo-EM Microscope can produce
up to 1PB data a year (we have 3!).
• Currently ½ Billion Files most only Kb
• Covers pretty much every file type
• CAMP has been Expanded from 3PB to
10PB
• Data needs to be kept for 10 years
• Archive facility is in progress!
16
16. Where we are working towards:
CAMP
(9PB)
INGEST
(1-2PB)
Melchett
(500TB)
BlackAdder
(testing) GARLIC
HPC Cluster
(400 Nodes)
Spectrum Protect
Backup (14PB)
Archive (TBD)
AFM
(single writer)
Remote Mount
HPC
Workstations
AFM
(single writer)