SlideShare a Scribd company logo
Research Computing Storage at
The Francis Crick Institute
Steve Hindmarsh
Head of Scientific Computing
steve.hindmarsh@crick.ac.uk
Michael Holliday
Senior HPC & Research Data Systems Engineer
michael.holliday@crick.ac.uk
DDN User Group SC18
The Francis Crick Institute
• A biomedical discovery institute dedicated to
understanding the fundamental biology underlying
health and disease.
• Founded in 2015 with the merger of two London
research institutes from the Medical Research Council
and Cancer Research UK into the Crick
• Supported by our founding partners:
3
“Discovery without boundaries”
• Core research areas:
• Growth and development
• Health and ageing
• Human biology
• Cancer
• Immune system
• Infectious disease
• Neuroscience
Multi-disciplinary approach: biology, physics,
chemistry, bioinformatics, maths,
engineering…
4
The Crick in numbers
• 1300 Scientists across 130 Labs, Research Groups &
Science Technology Platforms
• 300 Operations Staff
• Over 30,000 pieces of Scientific Equipment
• 13 Floors, 1500 Rooms
• 2 National / International facilities
• 1 Nobel Prize winner!
5
Scientific instruments
• Cryo-Electron microscopes
• Electron microscopes
• Light microscopes
• DNA/RNA sequencers
• Mass spectrometers
• Nuclear Magnetic Resonance (NMR) spectroscopes
• Etc.
All produce large volumes of data which needs to be captured, stored
and analysed.
6
DDN at the Crick
DDN underpins our research computing platform providing:
• Scalability – currently 10 PB and filling up fast!
• Performance
• Multiple and varied workloads
• Resilience
• Manageability
So far it has stood up well to all the workloads our scientists can throw at it. New
challenges include:
• Unprecedented data growth
• Machine/Deep Learning/AI – Nvidia V100 GPU cluster coming soon
7
CAMP – Crick Data Analysis and Management Platform –
Where did we come from:
• 4 Isilon Clusters with around 4PB of data going back 40 years
• Numerous NAS Boxes, Hard drives, USBs etc (we’re still finding out
about them now…)
• Storage from legacy institutes were managed in very different ways,
and had their own Domains and Permissions
• Data replicated in multiple locations but it wasn’t always clear
CAMP – Crick Data Analysis and Management Platform–
Where we started at the Crick:
CAMP
(3PB)
INGEST
(1PB)
GENERAL
(500TB)
Mill Hill LIF
GARLIC
HPC Cluster
(200 Nodes)
AFM
(IW)
AFM
(IW)
AFM
(RO)
AFM
(RO)
Remote Mount Remote Mount
Remote Mount
CAMP – Crick Data Analysis and Management Platform –
• FDR IB Fabric
• 2 Level Fabric
• CAMP
• 2 DDN GS12K Systems
• 20 Enclosures
• INGEST, General, LIF, Millhill:
• Each has 1 DDN GS7K
• 1 or 2 Enclosures
Problems we encountered…
• AFM
• Migration of legacy data
• Data, More Data & Even More Data….
• Instruments
• Labs adapting to the new systems
13
AFM
• INGEST and General were set up to be caches of CAMP storage
• At one point INGEST had 130 AFM links to CAMP
• Delays to syncing were noticeable to scientists, particularly those using the HPC
cluster
• Permissions were not syncing correctly between the two
• Cache was out of sync with home and unable to recover
• Used Independent Writer Mode
14
Data, More Data & Even More Data….
• Our Labs and STPs are producing more
data at faster and faster rates
• 1 Titan Cryo-EM Microscope can produce
up to 1PB data a year (we have 3!).
• Currently ½ Billion Files most only Kb
• Covers pretty much every file type
• CAMP has been Expanded from 3PB to
10PB
• Data needs to be kept for 10 years
• Archive facility is in progress!
16
Where we are working towards:
CAMP
(9PB)
INGEST
(1-2PB)
Melchett
(500TB)
BlackAdder
(testing) GARLIC
HPC Cluster
(400 Nodes)
Spectrum Protect
Backup (14PB)
Archive (TBD)
AFM
(single writer)
Remote Mount
HPC
Workstations
AFM
(single writer)
Questions?
20
Research Computing Storage at the Francis Crick Institute

More Related Content

Similar to Research Computing Storage at the Francis Crick Institute

Hybrid Cloud for CERN
Hybrid Cloud for CERNHybrid Cloud for CERN
Hybrid Cloud for CERN
Helix Nebula The Science Cloud
 
The Cambridge Research Computing Service
The Cambridge Research Computing ServiceThe Cambridge Research Computing Service
The Cambridge Research Computing Service
inside-BigData.com
 
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
thomasrconnor
 
Electron Microscopy Between OPIC, Oxford and eBIC
Electron Microscopy Between OPIC, Oxford and eBICElectron Microscopy Between OPIC, Oxford and eBIC
Electron Microscopy Between OPIC, Oxford and eBIC
Jisc
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
Globus
 
Advancing Research at London's Global University
Advancing Research at London's Global UniversityAdvancing Research at London's Global University
Advancing Research at London's Global University
inside-BigData.com
 
De gaynsrdec overview_11jan16
De gaynsrdec overview_11jan16De gaynsrdec overview_11jan16
De gaynsrdec overview_11jan16
MIT Enterprise Forum Cambridge
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
Globus
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
Helix Nebula The Science Cloud
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College London
Torsten Reimer
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
Govnet Events
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
Rafael C. Jimenez
 
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PROIDEA
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
Larry Smarr
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
thomasrconnor
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK research
Martin Hamilton
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 

Similar to Research Computing Storage at the Francis Crick Institute (20)

Hybrid Cloud for CERN
Hybrid Cloud for CERNHybrid Cloud for CERN
Hybrid Cloud for CERN
 
The Cambridge Research Computing Service
The Cambridge Research Computing ServiceThe Cambridge Research Computing Service
The Cambridge Research Computing Service
 
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
 
Electron Microscopy Between OPIC, Oxford and eBIC
Electron Microscopy Between OPIC, Oxford and eBICElectron Microscopy Between OPIC, Oxford and eBIC
Electron Microscopy Between OPIC, Oxford and eBIC
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Advancing Research at London's Global University
Advancing Research at London's Global UniversityAdvancing Research at London's Global University
Advancing Research at London's Global University
 
De gaynsrdec overview_11jan16
De gaynsrdec overview_11jan16De gaynsrdec overview_11jan16
De gaynsrdec overview_11jan16
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College London
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
ELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciencesELIXIR and data grand challenges in life sciences
ELIXIR and data grand challenges in life sciences
 
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Shared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK researchShared services - the future of HPC and big data facilities for UK research
Shared services - the future of HPC and big data facilities for UK research
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 

More from inside-BigData.com

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 

Research Computing Storage at the Francis Crick Institute

  • 1.
  • 2. Research Computing Storage at The Francis Crick Institute Steve Hindmarsh Head of Scientific Computing steve.hindmarsh@crick.ac.uk Michael Holliday Senior HPC & Research Data Systems Engineer michael.holliday@crick.ac.uk DDN User Group SC18
  • 3. The Francis Crick Institute • A biomedical discovery institute dedicated to understanding the fundamental biology underlying health and disease. • Founded in 2015 with the merger of two London research institutes from the Medical Research Council and Cancer Research UK into the Crick • Supported by our founding partners: 3
  • 4. “Discovery without boundaries” • Core research areas: • Growth and development • Health and ageing • Human biology • Cancer • Immune system • Infectious disease • Neuroscience Multi-disciplinary approach: biology, physics, chemistry, bioinformatics, maths, engineering… 4
  • 5. The Crick in numbers • 1300 Scientists across 130 Labs, Research Groups & Science Technology Platforms • 300 Operations Staff • Over 30,000 pieces of Scientific Equipment • 13 Floors, 1500 Rooms • 2 National / International facilities • 1 Nobel Prize winner! 5
  • 6. Scientific instruments • Cryo-Electron microscopes • Electron microscopes • Light microscopes • DNA/RNA sequencers • Mass spectrometers • Nuclear Magnetic Resonance (NMR) spectroscopes • Etc. All produce large volumes of data which needs to be captured, stored and analysed. 6
  • 7. DDN at the Crick DDN underpins our research computing platform providing: • Scalability – currently 10 PB and filling up fast! • Performance • Multiple and varied workloads • Resilience • Manageability So far it has stood up well to all the workloads our scientists can throw at it. New challenges include: • Unprecedented data growth • Machine/Deep Learning/AI – Nvidia V100 GPU cluster coming soon 7
  • 8.
  • 9. CAMP – Crick Data Analysis and Management Platform – Where did we come from: • 4 Isilon Clusters with around 4PB of data going back 40 years • Numerous NAS Boxes, Hard drives, USBs etc (we’re still finding out about them now…) • Storage from legacy institutes were managed in very different ways, and had their own Domains and Permissions • Data replicated in multiple locations but it wasn’t always clear
  • 10. CAMP – Crick Data Analysis and Management Platform–
  • 11. Where we started at the Crick: CAMP (3PB) INGEST (1PB) GENERAL (500TB) Mill Hill LIF GARLIC HPC Cluster (200 Nodes) AFM (IW) AFM (IW) AFM (RO) AFM (RO) Remote Mount Remote Mount Remote Mount
  • 12. CAMP – Crick Data Analysis and Management Platform – • FDR IB Fabric • 2 Level Fabric • CAMP • 2 DDN GS12K Systems • 20 Enclosures • INGEST, General, LIF, Millhill: • Each has 1 DDN GS7K • 1 or 2 Enclosures
  • 13. Problems we encountered… • AFM • Migration of legacy data • Data, More Data & Even More Data…. • Instruments • Labs adapting to the new systems 13
  • 14. AFM • INGEST and General were set up to be caches of CAMP storage • At one point INGEST had 130 AFM links to CAMP • Delays to syncing were noticeable to scientists, particularly those using the HPC cluster • Permissions were not syncing correctly between the two • Cache was out of sync with home and unable to recover • Used Independent Writer Mode 14
  • 15. Data, More Data & Even More Data…. • Our Labs and STPs are producing more data at faster and faster rates • 1 Titan Cryo-EM Microscope can produce up to 1PB data a year (we have 3!). • Currently ½ Billion Files most only Kb • Covers pretty much every file type • CAMP has been Expanded from 3PB to 10PB • Data needs to be kept for 10 years • Archive facility is in progress! 16
  • 16. Where we are working towards: CAMP (9PB) INGEST (1-2PB) Melchett (500TB) BlackAdder (testing) GARLIC HPC Cluster (400 Nodes) Spectrum Protect Backup (14PB) Archive (TBD) AFM (single writer) Remote Mount HPC Workstations AFM (single writer)