Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Globus in European Life Science

6 views

Published on

This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Steven Newhouse from European Bioinformatics Institute.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Globus in European Life Science

  1. 1. Steven Newhouse Head of Technical Services, EMBL-EBI steven.newhouse@ebi.ac.uk Globus in European Life-Science GlobusWorld 2019
  2. 2. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Monterotondo, Rome, Italy >1600 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  3. 3. What is EMBL-EBI? • Europe’s home for biological data services, research and training • A trusted data provider for the life sciences • International: 600 members of staff from 60 nations OUR MISSION (1/5) To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress
  4. 4. Literature services • BioStudies • Europe PMC Chemistry services • ChEBI • ChEMBL • MetaboLights • SureChEMBL Macromolecular & cellular structure • Protein Data Bank in Europe (PDBe) • PDBe-KB • Electron Microscopy Data Bank • EMPIAR Molecular atlas • Array Express • Expression Atlas • PRIDE Proteins & protein families • MGnify • InterPro • Pfam • Rfam • RNA Central • UniProt Genes, genomes & variation • Ensembl • Ensembl Genomes • GWAS Catalog Molecular systems • BioModels • IntAct • OmicsDI • Reactome Molecular archives • European Nucleotide Archive • European Variation Archive • European Genome-phenome Archive • Experimental Factor Ontology • BioSamples • Mouse Resources Data resources at EMBL-EBI Cross dom ain resources . C ro ss d o m a in re s o u rc e s d g P b s y
  5. 5. What we do: Data In Validate Correlate Data Out Volume: ~2PB/month • FTP: 56% • Aspera: 42% • Globus: 2% Analysis Capacity: • HTC: 28,500 job slots • HPC: 6,600 job slots • Cloud: 6,000 vCPUs • VMware: 1,500 cores Raw Storage (241PB): • Object Store: 103PB • NAS: 81PB • HPC Storage: 27PB • Tape: 30PB ~38 million requests to EMBL-EBI websites every day EMBL-EBI delivered 140 million jobs to its users in 2017 Requests from 3.3 million unique hosts to the EMBL-EBI websites, each month ~1PB/month
  6. 6. ELIXIR – Research Infrastructure for Life Science 6 • Tools Services & connectors to drive access and exploitation • Standards Integration and interoperability of data and services. • Training Professional skills for managing and exploiting data • Compute Access, Exchange & Compute on sensitive data • Data Sustain core data resources
  7. 7. Current Integration • ELIXIR AAI & EMBL-EBI IdP • Consistent ID provision across Europe and ELIXIR services • Integrated into Globus Transfer • Data Transfers • From Data Resources (e.g. EMBL-EBI) to a researcher’s desktop • From Data Resources (e.g. EMBL-EBI) to a cloud provider • From a researcher’s institute to a cloud provider
  8. 8. Planned Overhaul of Transfer Infrastructure at EMBL-EBI • Downloads • Would like to move away from Aspera • Performance w.r.t. Globus Transfer? • Would like to increase use of Globus Transfer • Understanding the barriers to adoption? Technical? Political? • Uploads • Moving towards an integrated upload infrastructure: common AAI & file space • Explore the use of Globus Transfer: ease of use, installation, AAI & performance • Current prototype uses Tus.io
  9. 9. Future: Accessing Life-Science Data from Object Store • FIRE: FIle REplication Service • In existence for over 10 years • Grown to over 20PB • Evolution of technologies • Previous: Distinct NFS systems • Now: Distributed internal Object Store & tape • Future: Distributed internal Object Store & cloud • Challenge: Very long tail of data access patterns • Need ‘shopping cart’ model to retrieve data from cold storage and deliver to endpoint
  10. 10. Future: Moving Data within a Hybrid Ecosystem • European Open Science Cloud (EOSC) • Federation of cloud resources (a.k.a. grid) • Integration alongside commercial cloud resources • More broadly the services needed for the research life-cycle • ELIXIR Cloud Resources • National & domain cloud resources will probably appear within EOSC • EMBL-EBI Cloud Resources • For our own purposes… need to move data from internal to cloud resources • And for the community!
  11. 11. Summary • Some use within EMBL-EBI for edge downloads • Scope for more use and to integrate into uploads • Need reliable transfer to underpin movement of data sets • To users, service providers and public clouds • Contact today: • Steven Newhouse (steven.newhouse@ebi.ac.uk) • Andrea Cristofori (crsndr@ebi.ac.uk)

×