Globus in European Life Science

Globus
Steven Newhouse
Head of Technical Services, EMBL-EBI
steven.newhouse@ebi.ac.uk
Globus in European Life-Science
GlobusWorld 2019
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Monterotondo, Rome, Italy
>1600 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
What is EMBL-EBI?
• Europe’s home for biological data services, research and training
• A trusted data provider for the life sciences
• International: 600 members of staff from 60 nations
OUR MISSION (1/5)
To provide freely available data
and bioinformatics services to all
facets of the scientific community
in ways that promote scientific
progress
Literature services
• BioStudies
• Europe PMC
Chemistry services
• ChEBI
• ChEMBL
• MetaboLights
• SureChEMBL
Macromolecular & cellular
structure
• Protein Data Bank in Europe
(PDBe)
• PDBe-KB
• Electron Microscopy Data Bank
• EMPIAR
Molecular atlas
• Array Express
• Expression Atlas
• PRIDE
Proteins & protein
families
• MGnify
• InterPro
• Pfam
• Rfam
• RNA Central
• UniProt
Genes, genomes & variation
• Ensembl
• Ensembl Genomes
• GWAS Catalog
Molecular systems
• BioModels
• IntAct
• OmicsDI
• Reactome
Molecular archives
• European Nucleotide Archive
• European Variation Archive
• European Genome-phenome Archive
• Experimental Factor Ontology
• BioSamples
• Mouse Resources
Data resources at EMBL-EBI
Cross dom
ain
resources
.
C
ro
ss
d
o
m
a
in
re
s
o
u
rc
e
s
d
g
P
b
s
y
What we do:
Data In Validate Correlate Data Out
Volume: ~2PB/month
• FTP: 56%
• Aspera: 42%
• Globus: 2%
Analysis Capacity:
• HTC: 28,500 job slots
• HPC: 6,600 job slots
• Cloud: 6,000 vCPUs
• VMware: 1,500 cores
Raw Storage (241PB):
• Object Store: 103PB
• NAS: 81PB
• HPC Storage: 27PB
• Tape: 30PB
~38 million
requests to EMBL-EBI
websites every day
EMBL-EBI delivered
140 million
jobs to its users in
2017
Requests from
3.3 million
unique hosts to the
EMBL-EBI websites,
each month
~1PB/month
ELIXIR – Research Infrastructure for Life Science
6
• Tools
Services & connectors to drive access and exploitation
• Standards
Integration and interoperability of data and services.
• Training
Professional skills for managing and exploiting data
• Compute
Access, Exchange & Compute on sensitive data
• Data
Sustain core data resources
Current Integration
• ELIXIR AAI & EMBL-EBI IdP
• Consistent ID provision across Europe and ELIXIR services
• Integrated into Globus Transfer
• Data Transfers
• From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop
• From Data Resources (e.g. EMBL-EBI) to a cloud provider
• From a researcher’s institute to a cloud provider
Planned Overhaul of Transfer Infrastructure at EMBL-EBI
• Downloads
• Would like to move away from Aspera
• Performance w.r.t. Globus Transfer?
• Would like to increase use of Globus Transfer
• Understanding the barriers to adoption? Technical? Political?
• Uploads
• Moving towards an integrated upload infrastructure: common AAI & file space
• Explore the use of Globus Transfer: ease of use, installation, AAI & performance
• Current prototype uses Tus.io
Future: Accessing Life-Science Data from Object Store
• FIRE: FIle REplication Service
• In existence for over 10 years
• Grown to over 20PB
• Evolution of technologies
• Previous: Distinct NFS systems
• Now: Distributed internal Object Store & tape
• Future: Distributed internal Object Store & cloud
• Challenge: Very long tail of data access patterns
• Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
Future: Moving Data within a Hybrid Ecosystem
• European Open Science Cloud (EOSC)
• Federation of cloud resources (a.k.a. grid)
• Integration alongside commercial cloud resources
• More broadly the services needed for the research life-cycle
• ELIXIR Cloud Resources
• National & domain cloud resources will probably appear within EOSC
• EMBL-EBI Cloud Resources
• For our own purposes… need to move data from internal to cloud resources
• And for the community!
Summary
• Some use within EMBL-EBI for edge downloads
• Scope for more use and to integrate into uploads
• Need reliable transfer to underpin movement of data sets
• To users, service providers and public clouds
• Contact today:
• Steven Newhouse (steven.newhouse@ebi.ac.uk)
• Andrea Cristofori (crsndr@ebi.ac.uk)
1 of 11

More Related Content

What's hot(20)

Similar to Globus in European Life Science(20)

Recently uploaded(20)

CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 views
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya59 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting177 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman161 views

Globus in European Life Science

  • 1. Steven Newhouse Head of Technical Services, EMBL-EBI steven.newhouse@ebi.ac.uk Globus in European Life-Science GlobusWorld 2019
  • 2. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Monterotondo, Rome, Italy >1600 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 3. What is EMBL-EBI? • Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • International: 600 members of staff from 60 nations OUR MISSION (1/5) To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress
  • 4. Literature services • BioStudies • Europe PMC Chemistry services • ChEBI • ChEMBL • MetaboLights • SureChEMBL Macromolecular & cellular structure • Protein Data Bank in Europe (PDBe) • PDBe-KB • Electron Microscopy Data Bank • EMPIAR Molecular atlas • Array Express • Expression Atlas • PRIDE Proteins & protein families • MGnify • InterPro • Pfam • Rfam • RNA Central • UniProt Genes, genomes & variation • Ensembl • Ensembl Genomes • GWAS Catalog Molecular systems • BioModels • IntAct • OmicsDI • Reactome Molecular archives • European Nucleotide Archive • European Variation Archive • European Genome-phenome Archive • Experimental Factor Ontology • BioSamples • Mouse Resources Data resources at EMBL-EBI Cross dom ain resources . C ro ss d o m a in re s o u rc e s d g P b s y
  • 5. What we do: Data In Validate Correlate Data Out Volume: ~2PB/month • FTP: 56% • Aspera: 42% • Globus: 2% Analysis Capacity: • HTC: 28,500 job slots • HPC: 6,600 job slots • Cloud: 6,000 vCPUs • VMware: 1,500 cores Raw Storage (241PB): • Object Store: 103PB • NAS: 81PB • HPC Storage: 27PB • Tape: 30PB ~38 million requests to EMBL-EBI websites every day EMBL-EBI delivered 140 million jobs to its users in 2017 Requests from 3.3 million unique hosts to the EMBL-EBI websites, each month ~1PB/month
  • 6. ELIXIR – Research Infrastructure for Life Science 6 • Tools Services & connectors to drive access and exploitation • Standards Integration and interoperability of data and services. • Training Professional skills for managing and exploiting data • Compute Access, Exchange & Compute on sensitive data • Data Sustain core data resources
  • 7. Current Integration • ELIXIR AAI & EMBL-EBI IdP • Consistent ID provision across Europe and ELIXIR services • Integrated into Globus Transfer • Data Transfers • From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop • From Data Resources (e.g. EMBL-EBI) to a cloud provider • From a researcher’s institute to a cloud provider
  • 8. Planned Overhaul of Transfer Infrastructure at EMBL-EBI • Downloads • Would like to move away from Aspera • Performance w.r.t. Globus Transfer? • Would like to increase use of Globus Transfer • Understanding the barriers to adoption? Technical? Political? • Uploads • Moving towards an integrated upload infrastructure: common AAI & file space • Explore the use of Globus Transfer: ease of use, installation, AAI & performance • Current prototype uses Tus.io
  • 9. Future: Accessing Life-Science Data from Object Store • FIRE: FIle REplication Service • In existence for over 10 years • Grown to over 20PB • Evolution of technologies • Previous: Distinct NFS systems • Now: Distributed internal Object Store & tape • Future: Distributed internal Object Store & cloud • Challenge: Very long tail of data access patterns • Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
  • 10. Future: Moving Data within a Hybrid Ecosystem • European Open Science Cloud (EOSC) • Federation of cloud resources (a.k.a. grid) • Integration alongside commercial cloud resources • More broadly the services needed for the research life-cycle • ELIXIR Cloud Resources • National & domain cloud resources will probably appear within EOSC • EMBL-EBI Cloud Resources • For our own purposes… need to move data from internal to cloud resources • And for the community!
  • 11. Summary • Some use within EMBL-EBI for edge downloads • Scope for more use and to integrate into uploads • Need reliable transfer to underpin movement of data sets • To users, service providers and public clouds • Contact today: • Steven Newhouse (steven.newhouse@ebi.ac.uk) • Andrea Cristofori (crsndr@ebi.ac.uk)