Steven Newhouse
Head of Technical Services, EMBL-EBI
steven.newhouse@ebi.ac.uk
Globus in European Life-Science
GlobusWorld 2019
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Monterotondo, Rome, Italy
>1600 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
What is EMBL-EBI?
• Europe’s home for biological data services, research and training
• A trusted data provider for the life sciences
• International: 600 members of staff from 60 nations
OUR MISSION (1/5)
To provide freely available data
and bioinformatics services to all
facets of the scientific community
in ways that promote scientific
progress
Literature services
• BioStudies
• Europe PMC
Chemistry services
• ChEBI
• ChEMBL
• MetaboLights
• SureChEMBL
Macromolecular & cellular
structure
• Protein Data Bank in Europe
(PDBe)
• PDBe-KB
• Electron Microscopy Data Bank
• EMPIAR
Molecular atlas
• Array Express
• Expression Atlas
• PRIDE
Proteins & protein
families
• MGnify
• InterPro
• Pfam
• Rfam
• RNA Central
• UniProt
Genes, genomes & variation
• Ensembl
• Ensembl Genomes
• GWAS Catalog
Molecular systems
• BioModels
• IntAct
• OmicsDI
• Reactome
Molecular archives
• European Nucleotide Archive
• European Variation Archive
• European Genome-phenome Archive
• Experimental Factor Ontology
• BioSamples
• Mouse Resources
Data resources at EMBL-EBI
Cross dom
ain
resources
.
C
ro
ss
d
o
m
a
in
re
s
o
u
rc
e
s
d
g
P
b
s
y
What we do:
Data In Validate Correlate Data Out
Volume: ~2PB/month
• FTP: 56%
• Aspera: 42%
• Globus: 2%
Analysis Capacity:
• HTC: 28,500 job slots
• HPC: 6,600 job slots
• Cloud: 6,000 vCPUs
• VMware: 1,500 cores
Raw Storage (241PB):
• Object Store: 103PB
• NAS: 81PB
• HPC Storage: 27PB
• Tape: 30PB
~38 million
requests to EMBL-EBI
websites every day
EMBL-EBI delivered
140 million
jobs to its users in
2017
Requests from
3.3 million
unique hosts to the
EMBL-EBI websites,
each month
~1PB/month
ELIXIR – Research Infrastructure for Life Science
6
• Tools
Services & connectors to drive access and exploitation
• Standards
Integration and interoperability of data and services.
• Training
Professional skills for managing and exploiting data
• Compute
Access, Exchange & Compute on sensitive data
• Data
Sustain core data resources
Current Integration
• ELIXIR AAI & EMBL-EBI IdP
• Consistent ID provision across Europe and ELIXIR services
• Integrated into Globus Transfer
• Data Transfers
• From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop
• From Data Resources (e.g. EMBL-EBI) to a cloud provider
• From a researcher’s institute to a cloud provider
Planned Overhaul of Transfer Infrastructure at EMBL-EBI
• Downloads
• Would like to move away from Aspera
• Performance w.r.t. Globus Transfer?
• Would like to increase use of Globus Transfer
• Understanding the barriers to adoption? Technical? Political?
• Uploads
• Moving towards an integrated upload infrastructure: common AAI & file space
• Explore the use of Globus Transfer: ease of use, installation, AAI & performance
• Current prototype uses Tus.io
Future: Accessing Life-Science Data from Object Store
• FIRE: FIle REplication Service
• In existence for over 10 years
• Grown to over 20PB
• Evolution of technologies
• Previous: Distinct NFS systems
• Now: Distributed internal Object Store & tape
• Future: Distributed internal Object Store & cloud
• Challenge: Very long tail of data access patterns
• Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
Future: Moving Data within a Hybrid Ecosystem
• European Open Science Cloud (EOSC)
• Federation of cloud resources (a.k.a. grid)
• Integration alongside commercial cloud resources
• More broadly the services needed for the research life-cycle
• ELIXIR Cloud Resources
• National & domain cloud resources will probably appear within EOSC
• EMBL-EBI Cloud Resources
• For our own purposes… need to move data from internal to cloud resources
• And for the community!
Summary
• Some use within EMBL-EBI for edge downloads
• Scope for more use and to integrate into uploads
• Need reliable transfer to underpin movement of data sets
• To users, service providers and public clouds
• Contact today:
• Steven Newhouse (steven.newhouse@ebi.ac.uk)
• Andrea Cristofori (crsndr@ebi.ac.uk)

Globus in European Life Science

  • 1.
    Steven Newhouse Head ofTechnical Services, EMBL-EBI steven.newhouse@ebi.ac.uk Globus in European Life-Science GlobusWorld 2019
  • 2.
    The European MolecularBiology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Monterotondo, Rome, Italy >1600 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 3.
    What is EMBL-EBI? •Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • International: 600 members of staff from 60 nations OUR MISSION (1/5) To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress
  • 4.
    Literature services • BioStudies •Europe PMC Chemistry services • ChEBI • ChEMBL • MetaboLights • SureChEMBL Macromolecular & cellular structure • Protein Data Bank in Europe (PDBe) • PDBe-KB • Electron Microscopy Data Bank • EMPIAR Molecular atlas • Array Express • Expression Atlas • PRIDE Proteins & protein families • MGnify • InterPro • Pfam • Rfam • RNA Central • UniProt Genes, genomes & variation • Ensembl • Ensembl Genomes • GWAS Catalog Molecular systems • BioModels • IntAct • OmicsDI • Reactome Molecular archives • European Nucleotide Archive • European Variation Archive • European Genome-phenome Archive • Experimental Factor Ontology • BioSamples • Mouse Resources Data resources at EMBL-EBI Cross dom ain resources . C ro ss d o m a in re s o u rc e s d g P b s y
  • 5.
    What we do: DataIn Validate Correlate Data Out Volume: ~2PB/month • FTP: 56% • Aspera: 42% • Globus: 2% Analysis Capacity: • HTC: 28,500 job slots • HPC: 6,600 job slots • Cloud: 6,000 vCPUs • VMware: 1,500 cores Raw Storage (241PB): • Object Store: 103PB • NAS: 81PB • HPC Storage: 27PB • Tape: 30PB ~38 million requests to EMBL-EBI websites every day EMBL-EBI delivered 140 million jobs to its users in 2017 Requests from 3.3 million unique hosts to the EMBL-EBI websites, each month ~1PB/month
  • 6.
    ELIXIR – ResearchInfrastructure for Life Science 6 • Tools Services & connectors to drive access and exploitation • Standards Integration and interoperability of data and services. • Training Professional skills for managing and exploiting data • Compute Access, Exchange & Compute on sensitive data • Data Sustain core data resources
  • 7.
    Current Integration • ELIXIRAAI & EMBL-EBI IdP • Consistent ID provision across Europe and ELIXIR services • Integrated into Globus Transfer • Data Transfers • From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop • From Data Resources (e.g. EMBL-EBI) to a cloud provider • From a researcher’s institute to a cloud provider
  • 8.
    Planned Overhaul ofTransfer Infrastructure at EMBL-EBI • Downloads • Would like to move away from Aspera • Performance w.r.t. Globus Transfer? • Would like to increase use of Globus Transfer • Understanding the barriers to adoption? Technical? Political? • Uploads • Moving towards an integrated upload infrastructure: common AAI & file space • Explore the use of Globus Transfer: ease of use, installation, AAI & performance • Current prototype uses Tus.io
  • 9.
    Future: Accessing Life-ScienceData from Object Store • FIRE: FIle REplication Service • In existence for over 10 years • Grown to over 20PB • Evolution of technologies • Previous: Distinct NFS systems • Now: Distributed internal Object Store & tape • Future: Distributed internal Object Store & cloud • Challenge: Very long tail of data access patterns • Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
  • 10.
    Future: Moving Datawithin a Hybrid Ecosystem • European Open Science Cloud (EOSC) • Federation of cloud resources (a.k.a. grid) • Integration alongside commercial cloud resources • More broadly the services needed for the research life-cycle • ELIXIR Cloud Resources • National & domain cloud resources will probably appear within EOSC • EMBL-EBI Cloud Resources • For our own purposes… need to move data from internal to cloud resources • And for the community!
  • 11.
    Summary • Some usewithin EMBL-EBI for edge downloads • Scope for more use and to integrate into uploads • Need reliable transfer to underpin movement of data sets • To users, service providers and public clouds • Contact today: • Steven Newhouse (steven.newhouse@ebi.ac.uk) • Andrea Cristofori (crsndr@ebi.ac.uk)