1. Steven Newhouse
Head of Technical Services, EMBL-EBI
steven.newhouse@ebi.ac.uk
Globus in European Life-Science
GlobusWorld 2019
2. The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Monterotondo, Rome, Italy
>1600 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
3. What is EMBL-EBI?
• Europe’s home for biological data services, research and training
• A trusted data provider for the life sciences
• International: 600 members of staff from 60 nations
OUR MISSION (1/5)
To provide freely available data
and bioinformatics services to all
facets of the scientific community
in ways that promote scientific
progress
4. Literature services
• BioStudies
• Europe PMC
Chemistry services
• ChEBI
• ChEMBL
• MetaboLights
• SureChEMBL
Macromolecular & cellular
structure
• Protein Data Bank in Europe
(PDBe)
• PDBe-KB
• Electron Microscopy Data Bank
• EMPIAR
Molecular atlas
• Array Express
• Expression Atlas
• PRIDE
Proteins & protein
families
• MGnify
• InterPro
• Pfam
• Rfam
• RNA Central
• UniProt
Genes, genomes & variation
• Ensembl
• Ensembl Genomes
• GWAS Catalog
Molecular systems
• BioModels
• IntAct
• OmicsDI
• Reactome
Molecular archives
• European Nucleotide Archive
• European Variation Archive
• European Genome-phenome Archive
• Experimental Factor Ontology
• BioSamples
• Mouse Resources
Data resources at EMBL-EBI
Cross dom
ain
resources
.
C
ro
ss
d
o
m
a
in
re
s
o
u
rc
e
s
d
g
P
b
s
y
5. What we do:
Data In Validate Correlate Data Out
Volume: ~2PB/month
• FTP: 56%
• Aspera: 42%
• Globus: 2%
Analysis Capacity:
• HTC: 28,500 job slots
• HPC: 6,600 job slots
• Cloud: 6,000 vCPUs
• VMware: 1,500 cores
Raw Storage (241PB):
• Object Store: 103PB
• NAS: 81PB
• HPC Storage: 27PB
• Tape: 30PB
~38 million
requests to EMBL-EBI
websites every day
EMBL-EBI delivered
140 million
jobs to its users in
2017
Requests from
3.3 million
unique hosts to the
EMBL-EBI websites,
each month
~1PB/month
6. ELIXIR – Research Infrastructure for Life Science
6
• Tools
Services & connectors to drive access and exploitation
• Standards
Integration and interoperability of data and services.
• Training
Professional skills for managing and exploiting data
• Compute
Access, Exchange & Compute on sensitive data
• Data
Sustain core data resources
7. Current Integration
• ELIXIR AAI & EMBL-EBI IdP
• Consistent ID provision across Europe and ELIXIR services
• Integrated into Globus Transfer
• Data Transfers
• From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop
• From Data Resources (e.g. EMBL-EBI) to a cloud provider
• From a researcher’s institute to a cloud provider
8. Planned Overhaul of Transfer Infrastructure at EMBL-EBI
• Downloads
• Would like to move away from Aspera
• Performance w.r.t. Globus Transfer?
• Would like to increase use of Globus Transfer
• Understanding the barriers to adoption? Technical? Political?
• Uploads
• Moving towards an integrated upload infrastructure: common AAI & file space
• Explore the use of Globus Transfer: ease of use, installation, AAI & performance
• Current prototype uses Tus.io
9. Future: Accessing Life-Science Data from Object Store
• FIRE: FIle REplication Service
• In existence for over 10 years
• Grown to over 20PB
• Evolution of technologies
• Previous: Distinct NFS systems
• Now: Distributed internal Object Store & tape
• Future: Distributed internal Object Store & cloud
• Challenge: Very long tail of data access patterns
• Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
10. Future: Moving Data within a Hybrid Ecosystem
• European Open Science Cloud (EOSC)
• Federation of cloud resources (a.k.a. grid)
• Integration alongside commercial cloud resources
• More broadly the services needed for the research life-cycle
• ELIXIR Cloud Resources
• National & domain cloud resources will probably appear within EOSC
• EMBL-EBI Cloud Resources
• For our own purposes… need to move data from internal to cloud resources
• And for the community!
11. Summary
• Some use within EMBL-EBI for edge downloads
• Scope for more use and to integrate into uploads
• Need reliable transfer to underpin movement of data sets
• To users, service providers and public clouds
• Contact today:
• Steven Newhouse (steven.newhouse@ebi.ac.uk)
• Andrea Cristofori (crsndr@ebi.ac.uk)