SlideShare a Scribd company logo
European Molecular Biology Laboratory
European Bioinformatics Institute
The home for big data in biology
What is EMBL-EBI?
Europe’s home for biological data services, research and training
A trusted data provider for the life sciences
Part of the European Molecular Biology Laboratory, an
intergovernmental research organisation
International: 650 members of staff from 66 nations
EMBL member states
Austria, Belgium, Croatia, Czech Republic,
Denmark, Finland, France, Germany, Greece,
Hungary, Iceland, Ireland, Israel, Italy,
Luxembourg, Malta, Montenegro, the
Netherlands, Norway, Portugal, Slovakia, Spain,
Sweden, Switzerland and the United Kingdom
Associate member states: Argentina, Australia
Prospect member states: Lithuania, Poland
Our mission
Deliver excellent
research
Train the next
generation of
scientists
Engage with
industry
Coordinate
bioinformatics in
Europe
Deliver scientific
services
The European Molecular Biology Laboratory
Heidelberg, Germany
Main Laboratory
Barcelona, Spain
Tissue Biology, Disease Modeling
80+ nationalities
Hinxton, Cambridge, UK
Bioinformatics
Mouse Biology
Rome, Italy
>1700 personnel
Grenoble, France
Hamburg, Germany
Structural Biology
6 sites in Europe
Structural Biology
Data resources at EMBL-EBI
Database interactions
• Data exchange between EBI
data resources
• Arc width weighted by the
number of different data types
exchanged
Data volume doubles every two
years
• => half of our data will always be < 2
years old
EGA and ENA account for the bulk of
the data
• DNA sequences
BioImaging repository
• Just starting, will be big
1 PB
1 TB
1 GB
2004 2019
And is getting cheaper
to produce
$100M
$1M
$10K
$100
2001 2020
Cost per Human Genome
Moore’s Law
See the live map at www.ebi.ac.uk/about/our-impact
EMBL-EBI
Current Scientific Data Repositories
ARCHIVER “current state of the art” report: https://doi.org/10.5281/zenodo.3618215
EMBL1–FIRE
PIC2–MixFileStorage
DESY1–IndividualScientist
CERN2–CERNOpenData
CERN3–CERNDigitalMemory
CERN1–TheBaBarExperiment
PIC3–DataDistribution
EMBL2–CloudCaching
PIC1–LargeFileStorage
DESY2–PetraIIIExperiment
DESY3–EUXFELExperiment
https://archiver-project.eu/early-adopters-programme
European Open Science Cloud (EOSC)
22
slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
23slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
•
• Analysis results considered in the competitive R&D tender
• Technical and organisational measures aligned with European legislation in the services being
developed (by default & by design)
•
• Additional use cases expanding further the set of supported scientific domains
• Publicly funded research actors external to the ARCHIVER consortium
•
• For consortium members and Early Adopter organisations
• Beyond the lifetime of the project
ARCHIVER is the only EOSC related H2020 project focusing on Archiving & Long Term Data
Preservation services for PetaByte scale datasets across multiple research domains and countries.
•
•
•
•
•
Page 36| DESY – Archiver use case overview | S. Yakubov, M. Gasthuber | 08/06/2020 |
Main sources of data to be archived and preserved
>30PB annual
2-4PB annual
● two sites
○ Hamburg
○ Zeuthen (near Berlin)
● science areas
○ particle physics (LHC, Belle 2, …)
○ photon science (EuXFEL, Petra III, FLASH)
○ accelerator research (wakefield, Petra IV, …)
○ astrophysics
● all areas “data intensive science”
Page 37
automation
scale - #objects, volume, bandwidth
individual scientist / small working
groups
mid-size working groups (Petra III experiment)
• scientist is the archivist
• publication material + condensed
data + reference to full datasets
• DOI handling
• mainly interactive access
• few TB, 100MB/sec, 10K objects
• ~0.2-0.5PB annual
• more or less ‘classical
preservation model/practices’
Archiver challenges
large collaboration / site management
(EuXFEL organization)
• nominated member of the group is
the archivist (on behalf of)
• raw + derived data + code
• DOI + open-data handling
• comply with site data policy
• few 10TB, 1-2GB/sec, >150K objects
• <50% interactive access
• ~2-4PB annual
• site nominated archivist
responsible for all experiments
• raw + calibration data + code
• DOI + open-data handling
• comply with site data policy
• few 100TB, 2-10GB/sec, >30K obj.
• very low interactive access
• >30PB annual
API/CLI usage / less interactive
●
○
●
●
●
●
○
○
○
○
●
…
●
●
●
○
●
○
Data volume doubles every two
years
• => half of our data will always be < 2
years old1 PB
1 TB
1 GB
2004 2019
…
•
•
•
•
• … …
•
•
•
•
•
…
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
● →
○
○
○ →
● →
●
○
● →
●
○
●
○
○
●
Early Adopters
Programme
WHAT? WHY? HOW?
Early adopters Programme
WHAT?
Public sector &
not-for-profit
organisations
interested in the
ARCHIVER PCP
Help to
shape the
R&D
Test the
solutions
developed
Potential to
purchasing
pilot-scale
services
WHY?
Becoming an Early
Adopter means:
Be consulted
during the
preparation of
future ARCHIVER
phases
Access
material
produced by
the project
Propose your
own use cases
and get the
chance to test
resulting
services
Benefit from
training sessions
covering the
services developed
during the
ARCHIVER project
Accelerate the
procurement
process of
pilot-scale
services & have
certain conditions
Sign a declaration of
confidentiality and
non-conflict of
interest, stating that
your organisation
will not submit a bid
in response to the
ARCHIVER Request
for Tender
Allow the ARCHIVER
Buyers Group to list
your organisation’s
name in its Request
for Tenders and
subsequent Call-offs
In case of engagement
in testing activities,
describe the use
case(s) to potentially
test using the
ARCHIVER services
and to provide
structured feedback
on the testing results
to the ARCHIVER
project
Acknowledge the
support of the European
Commission and
ARCHIVER project in
any publications that
result from the
aforementioned testing
activities performed with
the developed services.
What are the obligations as an Early Adopter of ARCHIVER?
The Early Adopters engaged so far
Archival and accessibility
of omics data
Archiving Genomic and
Imaging Data
Multi-Repository Research Data
Harvester and Transformer for
Swedish Archival Standard
Preserving Australia’s digital
research, education and
cultural heritage
Defining National Scale Data
Archive Services
Use cases
https://archiver-project.eu/early-adopters-use-cases
HOW?
Are you part of a public
sector research
organisation with needs
for standards-based,
cost-effective data
archiving and preservation
services?
Are high ingest
rates, data volumes
at scale and
long-term support
important to you?
Express your interest
SCAN ME
Do you want to know more about
the Early Adopters Programme?
https://archiver-project.eu/early-adopters-programme
•
•
•
•
•
•
•
•
•
▪
▪
▪
▪
▪
▪
▪
▪
▪
▪
▪
▪
ARCHIVER
Arkivum and Google solution
Phase 2: Prototype
Arkivum Perpetua: Cloud Hosted Digital Preservation and
Archiving
Submit &
Validate
Preserve &
Safeguard
Discovery &
Access
Consumers (Content
Destinations)
Producers
(Content Sources)
Experiments
Labs
Repositories
Local Servers
Service
Providers
Transfer
Checks &
Validation
Metadata
Extraction
Ingestion &
Organisation
Retention
Management
File Format
Identification
Characterisatio
n
Validation
Normalisation
Packaging
(AIP/DIP)
Index &
Dedupe
Search &
Navigate
View
Secure Export
Publish
Staff
Researcher
Collaborators
Public
Media
• Scalable storage and compute
• High speed ingest and access
• Policy based cost optimization
• OAIS workflows and packages
• Digital Preservation rules and actions
• FAIR datasets and access
• Hosted scientific applications
• Open standards and specifications
• Exit and migration strategies
Arkivum / Google Solution:
Google Cloud Platform: PB Scale Storage, Compute and
Networking
• Deployment in GCP, on-premise
and hybrid cloud
• Portable to other cloud providers
• Kubernetes, containers, Anthos,
automated deployment
• Exit strategies using data escrow,
open standards and fast exports
Prototype: Portability and Exit Strategies
Portland Common Data Model
Pilot: Long Term Digital Preservation Hosted On GCP
Prototype: Factories for LTDP in Large Scale Science
Prototype: Approach
• Automation, Scalability and Efficiency: Preservation Factories
• Minimal Effort Ingest / Minimal Viable Preservation
• Dataset Authenticity, Integrity and Usability: FAIR
• Platform for building Trusted Digital Repositories
• Fully SaaS on GCP, but also portable to on-premise and hybrid deployments
Thank you
https://www.archiver-project.eu/
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony

More Related Content

What's hot

PHIDIAS - Boosting the use of cloud services for marine data management, serv...
PHIDIAS - Boosting the use of cloud services for marine data management, serv...PHIDIAS - Boosting the use of cloud services for marine data management, serv...
PHIDIAS - Boosting the use of cloud services for marine data management, serv...
Phidias
 

What's hot (20)

UK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas WorkshopUK RepositoryNet+ Mimas Workshop
UK RepositoryNet+ Mimas Workshop
 
Anne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at InaAnne couteux - Audiovisual archiving at Ina
Anne couteux - Audiovisual archiving at Ina
 
PHIDIAS - Boosting the use of cloud services for marine data management, serv...
PHIDIAS - Boosting the use of cloud services for marine data management, serv...PHIDIAS - Boosting the use of cloud services for marine data management, serv...
PHIDIAS - Boosting the use of cloud services for marine data management, serv...
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias: Steps forward in detection and identification of anomalous atmospher...
Phidias: Steps forward in detection and identification of anomalous atmospher...
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Geoservices Activities at EDINA
Geoservices Activities at EDINAGeoservices Activities at EDINA
Geoservices Activities at EDINA
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday Genereux
 
The Archiver project
The Archiver projectThe Archiver project
The Archiver project
 
Joining Forces in Digitisation, Storage and Access. An interim assessment aft...
Joining Forces in Digitisation, Storage and Access. An interim assessment aft...Joining Forces in Digitisation, Storage and Access. An interim assessment aft...
Joining Forces in Digitisation, Storage and Access. An interim assessment aft...
 
FUTEBOL - Federated Union of Telecommunications Research Facilities for an EU...
FUTEBOL - Federated Union of Telecommunications Research Facilities for an EU...FUTEBOL - Federated Union of Telecommunications Research Facilities for an EU...
FUTEBOL - Federated Union of Telecommunications Research Facilities for an EU...
 
DESY / XFEL Deployment Scenarios
DESY / XFEL Deployment Scenarios  DESY / XFEL Deployment Scenarios
DESY / XFEL Deployment Scenarios
 
Summary of the Deployment Scenarios and Functional Requirements
Summary of the Deployment Scenarios and Functional RequirementsSummary of the Deployment Scenarios and Functional Requirements
Summary of the Deployment Scenarios and Functional Requirements
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challenges
 
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch CatalogueExposing EO Linked (meta-)Data from OpenSearch Catalogue
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
Eva lis green - Evolution of Professional skill sets
Eva lis green - Evolution of Professional skill setsEva lis green - Evolution of Professional skill sets
Eva lis green - Evolution of Professional skill sets
 
A framework for visual search in broadcasting companies' multimedia archives
A framework for visual search in broadcasting companies' multimedia archives A framework for visual search in broadcasting companies' multimedia archives
A framework for visual search in broadcasting companies' multimedia archives
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...
 

Similar to Prototype Phase Kick-off Event and Ceremony

Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
OpenAIRE
 

Similar to Prototype Phase Kick-off Event and Ceremony (20)

A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Globus in European Life Science
Globus in European Life ScienceGlobus in European Life Science
Globus in European Life Science
 
Finalrevc
FinalrevcFinalrevc
Finalrevc
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
Infrastructure for the Data Revolution: How OpenAIRE supports the EC’s Open ...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...A Service Perspective: Unlocking metadata to enhance discoverability and conn...
A Service Perspective: Unlocking metadata to enhance discoverability and conn...
 
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und PerspektivenForschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
Forschungsdaten-Repositorien Typen, Herausforderungen und Perspektiven
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Optique presentation
Optique presentationOptique presentation
Optique presentation
 
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
OpenAIRE services and tools - 6th National Open Access Conference and OpenAIR...
 

More from Archiver

More from Archiver (20)

Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶Wrapping Up and Next Steps¶
Wrapping Up and Next Steps¶
 
Overview of the EOSC¶
Overview of the EOSC¶Overview of the EOSC¶
Overview of the EOSC¶
 
ARCHIVER Tender Requirements
ARCHIVER Tender RequirementsARCHIVER Tender Requirements
ARCHIVER Tender Requirements
 
Project update - João Fernandes
Project update - João FernandesProject update - João Fernandes
Project update - João Fernandes
 
Wrapping up and_next_steps_stansted
Wrapping up and_next_steps_stanstedWrapping up and_next_steps_stansted
Wrapping up and_next_steps_stansted
 
20190523 archiver fim
20190523 archiver fim20190523 archiver fim
20190523 archiver fim
 
Geant cloud peering-v2
Geant cloud peering-v2Geant cloud peering-v2
Geant cloud peering-v2
 
Archiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_finalArchiver omc stansted_tendering_procedure_and_requirements_final
Archiver omc stansted_tendering_procedure_and_requirements_final
 
Archiver 3rd omc_project_overview
Archiver 3rd omc_project_overviewArchiver 3rd omc_project_overview
Archiver 3rd omc_project_overview
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Introduction to_planning_poker_addestino
Introduction to_planning_poker_addestinoIntroduction to_planning_poker_addestino
Introduction to_planning_poker_addestino
 
Archiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project OverviewArchiver 2nd_OMC event_Barcelona_Project Overview
Archiver 2nd_OMC event_Barcelona_Project Overview
 
Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio Archiver OMC event_Barcelona_ Welcome to_accio
Archiver OMC event_Barcelona_ Welcome to_accio
 
6 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v26 presentation wrapping up and next steps v2
6 presentation wrapping up and next steps v2
 
5 introduction to geant
5 introduction to geant5 introduction to geant
5 introduction to geant
 
4 archiver omc session 1
4 archiver omc session 1 4 archiver omc session 1
4 archiver omc session 1
 
3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios3 archiver omc deployment_scenarios
3 archiver omc deployment_scenarios
 
2 procurement and legal aspects
2 procurement and legal aspects 2 procurement and legal aspects
2 procurement and legal aspects
 
1 archiver omc project_overview
1 archiver omc project_overview1 archiver omc project_overview
1 archiver omc project_overview
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Prototype Phase Kick-off Event and Ceremony

  • 1.
  • 2.
  • 3. European Molecular Biology Laboratory European Bioinformatics Institute The home for big data in biology
  • 4. What is EMBL-EBI? Europe’s home for biological data services, research and training A trusted data provider for the life sciences Part of the European Molecular Biology Laboratory, an intergovernmental research organisation International: 650 members of staff from 66 nations
  • 5. EMBL member states Austria, Belgium, Croatia, Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Luxembourg, Malta, Montenegro, the Netherlands, Norway, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Associate member states: Argentina, Australia Prospect member states: Lithuania, Poland
  • 6. Our mission Deliver excellent research Train the next generation of scientists Engage with industry Coordinate bioinformatics in Europe Deliver scientific services
  • 7. The European Molecular Biology Laboratory Heidelberg, Germany Main Laboratory Barcelona, Spain Tissue Biology, Disease Modeling 80+ nationalities Hinxton, Cambridge, UK Bioinformatics Mouse Biology Rome, Italy >1700 personnel Grenoble, France Hamburg, Germany Structural Biology 6 sites in Europe Structural Biology
  • 8. Data resources at EMBL-EBI
  • 9. Database interactions • Data exchange between EBI data resources • Arc width weighted by the number of different data types exchanged
  • 10. Data volume doubles every two years • => half of our data will always be < 2 years old EGA and ENA account for the bulk of the data • DNA sequences BioImaging repository • Just starting, will be big 1 PB 1 TB 1 GB 2004 2019
  • 11. And is getting cheaper to produce $100M $1M $10K $100 2001 2020 Cost per Human Genome Moore’s Law
  • 12. See the live map at www.ebi.ac.uk/about/our-impact
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Current Scientific Data Repositories ARCHIVER “current state of the art” report: https://doi.org/10.5281/zenodo.3618215
  • 20.
  • 22. European Open Science Cloud (EOSC) 22 slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
  • 23. 23slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
  • 24. slide courtesy of Bob Jones (EOSC Sustainability Working Group, CERN)
  • 25. • • Analysis results considered in the competitive R&D tender • Technical and organisational measures aligned with European legislation in the services being developed (by default & by design) • • Additional use cases expanding further the set of supported scientific domains • Publicly funded research actors external to the ARCHIVER consortium • • For consortium members and Early Adopter organisations • Beyond the lifetime of the project ARCHIVER is the only EOSC related H2020 project focusing on Archiving & Long Term Data Preservation services for PetaByte scale datasets across multiple research domains and countries.
  • 26.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Page 36| DESY – Archiver use case overview | S. Yakubov, M. Gasthuber | 08/06/2020 | Main sources of data to be archived and preserved >30PB annual 2-4PB annual ● two sites ○ Hamburg ○ Zeuthen (near Berlin) ● science areas ○ particle physics (LHC, Belle 2, …) ○ photon science (EuXFEL, Petra III, FLASH) ○ accelerator research (wakefield, Petra IV, …) ○ astrophysics ● all areas “data intensive science”
  • 37. Page 37 automation scale - #objects, volume, bandwidth individual scientist / small working groups mid-size working groups (Petra III experiment) • scientist is the archivist • publication material + condensed data + reference to full datasets • DOI handling • mainly interactive access • few TB, 100MB/sec, 10K objects • ~0.2-0.5PB annual • more or less ‘classical preservation model/practices’ Archiver challenges large collaboration / site management (EuXFEL organization) • nominated member of the group is the archivist (on behalf of) • raw + derived data + code • DOI + open-data handling • comply with site data policy • few 10TB, 1-2GB/sec, >150K objects • <50% interactive access • ~2-4PB annual • site nominated archivist responsible for all experiments • raw + calibration data + code • DOI + open-data handling • comply with site data policy • few 100TB, 2-10GB/sec, >30K obj. • very low interactive access • >30PB annual API/CLI usage / less interactive
  • 40.
  • 42.
  • 43.
  • 44. Data volume doubles every two years • => half of our data will always be < 2 years old1 PB 1 TB 1 GB 2004 2019
  • 48.
  • 49.
  • 50. ● → ○ ○ ○ → ● → ● ○ ● → ● ○ ● ○ ○ ●
  • 52. WHAT? WHY? HOW? Early adopters Programme
  • 53. WHAT? Public sector & not-for-profit organisations interested in the ARCHIVER PCP Help to shape the R&D Test the solutions developed Potential to purchasing pilot-scale services
  • 54. WHY? Becoming an Early Adopter means: Be consulted during the preparation of future ARCHIVER phases Access material produced by the project Propose your own use cases and get the chance to test resulting services Benefit from training sessions covering the services developed during the ARCHIVER project Accelerate the procurement process of pilot-scale services & have certain conditions
  • 55. Sign a declaration of confidentiality and non-conflict of interest, stating that your organisation will not submit a bid in response to the ARCHIVER Request for Tender Allow the ARCHIVER Buyers Group to list your organisation’s name in its Request for Tenders and subsequent Call-offs In case of engagement in testing activities, describe the use case(s) to potentially test using the ARCHIVER services and to provide structured feedback on the testing results to the ARCHIVER project Acknowledge the support of the European Commission and ARCHIVER project in any publications that result from the aforementioned testing activities performed with the developed services. What are the obligations as an Early Adopter of ARCHIVER?
  • 56. The Early Adopters engaged so far
  • 57. Archival and accessibility of omics data Archiving Genomic and Imaging Data Multi-Repository Research Data Harvester and Transformer for Swedish Archival Standard Preserving Australia’s digital research, education and cultural heritage Defining National Scale Data Archive Services Use cases https://archiver-project.eu/early-adopters-use-cases
  • 58. HOW? Are you part of a public sector research organisation with needs for standards-based, cost-effective data archiving and preservation services? Are high ingest rates, data volumes at scale and long-term support important to you? Express your interest SCAN ME
  • 59. Do you want to know more about the Early Adopters Programme? https://archiver-project.eu/early-adopters-programme
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 78.
  • 79.
  • 80.
  • 81.
  • 86.
  • 87.
  • 88.
  • 89. ARCHIVER Arkivum and Google solution Phase 2: Prototype
  • 90. Arkivum Perpetua: Cloud Hosted Digital Preservation and Archiving
  • 91. Submit & Validate Preserve & Safeguard Discovery & Access Consumers (Content Destinations) Producers (Content Sources) Experiments Labs Repositories Local Servers Service Providers Transfer Checks & Validation Metadata Extraction Ingestion & Organisation Retention Management File Format Identification Characterisatio n Validation Normalisation Packaging (AIP/DIP) Index & Dedupe Search & Navigate View Secure Export Publish Staff Researcher Collaborators Public Media
  • 92. • Scalable storage and compute • High speed ingest and access • Policy based cost optimization • OAIS workflows and packages • Digital Preservation rules and actions • FAIR datasets and access • Hosted scientific applications • Open standards and specifications • Exit and migration strategies Arkivum / Google Solution:
  • 93. Google Cloud Platform: PB Scale Storage, Compute and Networking
  • 94. • Deployment in GCP, on-premise and hybrid cloud • Portable to other cloud providers • Kubernetes, containers, Anthos, automated deployment • Exit strategies using data escrow, open standards and fast exports Prototype: Portability and Exit Strategies Portland Common Data Model
  • 95. Pilot: Long Term Digital Preservation Hosted On GCP
  • 96. Prototype: Factories for LTDP in Large Scale Science
  • 97. Prototype: Approach • Automation, Scalability and Efficiency: Preservation Factories • Minimal Effort Ingest / Minimal Viable Preservation • Dataset Authenticity, Integrity and Usability: FAIR • Platform for building Trusted Digital Repositories • Fully SaaS on GCP, but also portable to on-premise and hybrid deployments