This presentation explored the use-cases driving ARCHIVER at the audience gathered at INFN, in Rome, Italy, for the Open Data Ecosystem and CS3 conference, 27-29 January 2020
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Prototype Phase Kick-off Event and CeremonyArchiver
On Monday 7 December 2020, the selected consortia for the ARCHIVER prototype phase have been announced during a Public Award Ceremony.
The Kick-off marks the beginning of the Prototype implementation Phase, where the three selected to move forward will build prototypes of their solutions including all components, and basic functionality, interoperability, and security tests will be performed by IT specialists from the buyers’ group.
On 29 January 2020 ARCHIVER launched its Request for Tender with the purpose to award several Framework Agreements and work orders for the provision of R&D for hybrid end-to-end archival and preservation services that meet the innovation challenges of European Research communities, in the context of the European Open Science Cloud.
The tender was closed on 28 April 2020 and 15 R&D bids were submitted, with consortia that included 43 companies and organisations. The best bids have been selected and will start the first phase of the ARCHIVER R&D (Solution Design) in June 2020.
On Monday 8 June the selected consortia for the ARCHIVER design phase have been announced during a Public Award Ceremony starting at 14.00 CEST.
In light of the COVID-19 outbreak and the and consequent movement restrictions imposed in several countries, the event has been organised as a webinar, virtually hosted by Port d’Informació Científica (PIC), a member of the Buyers Group of the ARCHIVER consortium.
The Kick-off marks the beginning of the Solution Design Phase.
Prototype Phase Kick-off Event and CeremonyArchiver
On Monday 7 December 2020, the selected consortia for the ARCHIVER prototype phase have been announced during a Public Award Ceremony.
The Kick-off marks the beginning of the Prototype implementation Phase, where the three selected to move forward will build prototypes of their solutions including all components, and basic functionality, interoperability, and security tests will be performed by IT specialists from the buyers’ group.
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyBlue BRIDGE
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble.
A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues.
Presented by Peter Burnhill at e-Journals are forever? Preservation and Continuing Access to e-journal Content. A DPC, EDINA and JISC joint initiative, British Library, London, 26 April 2010.
Reference Model for an Open Archival Information Systems (OAIS): Overview and...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Alan Wood/A.E.Wood & Erickson/Lockheed Martin, Don Sawyer/NASA/GSFC, and Lou Reich/CSC. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias
PHIDIAS is organised a webinar entitled "Steps forward in detection and identification of anomalous atmospheric events" held on 13 October 2020 at 15:00 CEST in collaboration with ESCAPE project. The webinar aimed at showcasing how PHIDIAS is going to improve the usage of HPC and high performance data management services for the development of intelligent screening approaches for the exploitation of large amounts of satellite atmospheric data in an operational context.
Tutorial on Hybrid Data Infrastructures: D4Science as a case studyBlue BRIDGE
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations allowing scientists residing at distant places to collaborate. They may offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction. E-Infrastructures can have different implementations (Andronico et al 2011). A major distinction is between (i) Data e-Infrastructures, i.e. digital infrastructures promoting data sharing and consumption to a community of practice (e.g. MyOcean, Blanc 2008) and (ii) Computational e-Infrastructures, which support the processes required by a community of practice using GRID and Cloud computing facilities (e.g. Candela et al. 2013). A more recent type of e-Infrastructure is the Hybrid Data Infrastructure (HDI) (Candela et al. 2010), i.e. a Data and Computational e-Infrastructure that adopts a delivery model for data management, in which computing, storage, data and software are made available as-a-Service. HDIs support, for example, data transfer, data harmonization and data processing workflows. Hybrid Data e-Infrastructures have already been used in several European and international projects (e.g. i-Marine 2011; EuBrazil OpenBio 2011) and their exploitation is growing fast supporting new projects and initiatives, e.g. Parthenos, Ariadne, Descramble.
A particular HDI, named D4Science (Candela et al. 2009), has been used by communities of practice in the fields of biodiversity conservation, geothermal energy monitoring, fisheries management, and culture heritage. This e-Infrastructure hosts models and resources by several international organizations involved in these fields. Its capabilities help scientists to access and manage data, reuse data and models, obtain results in short time and share these results with other colleagues.
Presented by Peter Burnhill at e-Journals are forever? Preservation and Continuing Access to e-journal Content. A DPC, EDINA and JISC joint initiative, British Library, London, 26 April 2010.
Reference Model for an Open Archival Information Systems (OAIS): Overview and...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Alan Wood/A.E.Wood & Erickson/Lockheed Martin, Don Sawyer/NASA/GSFC, and Lou Reich/CSC. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
Phidias: Steps forward in detection and identification of anomalous atmospher...Phidias
PHIDIAS is organised a webinar entitled "Steps forward in detection and identification of anomalous atmospheric events" held on 13 October 2020 at 15:00 CEST in collaboration with ESCAPE project. The webinar aimed at showcasing how PHIDIAS is going to improve the usage of HPC and high performance data management services for the development of intelligent screening approaches for the exploitation of large amounts of satellite atmospheric data in an operational context.
This presentation, given by Bob Jones, CERN & HNSciCloud Coordinator, at the ESA-ESPI Workshop on “Space Data & Cloud Computing Infrastructures: Policies and Regulations”, describes what are the challenges and needs of the cloud users and explains how an hybrid cloud model can support them.
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
Gianpaolo Coro, ISTI-CNR, at BlueBRIDGE workshop on "Data Management services to support stock assessement", held during the Annual ICES Science conference 2016
BioDT for the UiO Science section meeting 2023-03-24Dag Endresen
Presentation of the Biodiversity Digital Twin (BioDT) project for the University of Oslo (UiO) Natural History Museum (NHMO) Science department on 2023-03-24.
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
This presentation was given during the Japan Geosciences Union 2019. Session details can be found at http://www.jpgu.org/meeting_e2019/SessionList_en/detail/M-GI31.htm
Archiver pilot phase kick off Award CeremonyArchiver
In the framework of the ARCHIVER pre-commercial procurement tender, between December 2020 and August 2021 three consortia worked on innovative, prototype solutions for Long-term data preservation, in close collaboration with CERN, EMBL-EBI, DESY and PIC. The selection process for proceeding to the next phase is over and the consortium/a selected to continue with the pilot phase were officially announced at a public ceremony on the 29th of November 2021
Archiver pilot phase kick off Award CeremonyArchiver
In the framework of the ARCHIVER pre-commercial procurement tender, between December 2020 and August 2021 three consortia worked on innovative, prototype solutions for Long-term data preservation, in close collaboration with CERN, EMBL-EBI, DESY and PIC. The selection process for proceeding to the next phase is over and the consortium/a selected to continue with the pilot phase were officially announced at a public ceremony on the 29th of November 2021
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
Keynote presentation given at "The Emerging Technology Forum – Data Creates Universe - Scientific Data Innovation Conference" of the "Pujiang Innovation Forum 2021" event.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
2. Project Objective
2
Focus: Archiving and Data Preservation Services using commercial cloud services to be available via the European Open Science
Cloud (EOSC)
Procurement R&D budget: 3.4M euro
Starting Date: 1st of January 2019
Duration: 36 Months
Coordinator: CERN (Lead Procurer)
3. Consortium
Includes Buyers and Experts in the preparation, execution and promotion of the Procurement of
R&D
3
Procurers - Public organisations committing funds to contribute to a joint-R&D-
procurement, research data use cases and R&D testing effort
Experts – Partner organisations bringing expertise in requirement assessment and promotion activities, not part of the
Buyers Group
4. Preferred Partners (Early Adopters)
4
• Confirmed subscription received from 11 organisations: High level of interest from the community
• Participants:
• Demand side public sector organisations Information Webinar (04th September) = 47 participants
• Key advantages
• Assess if resulting services address archiving and preservation needs
• Contribute and shape the R&D carried out in the project, contribute with use cases and have the option to purchase pilot-
scale services by the end of the project
5. Challenge
20/02/2020 5
• Demonstrate services for long-term
preservation and archiving in the PB
range of scientific data
• F.A.I.R archiving services following
best practices and standards
• Expand resulting services to several
scientific domains
• Transparent business models and
make resulting services available
through the EOSC catalogue
Current Status of Scientific Data
Repositories
• Basic bit preservation and
archiving capabilities
• Data volumes and
communities growing
• Longstanding archiving and
preservation activities, but
most of data not yet
published
• Fragmentation across
scientific disciplines with
underestimation of costs at
the planning phase
6. 6
R&D Scoping Activities and Dialogue with the Private Sector
Continuous updates to the FAQ
Consortium Matchmaking
Early Adopters Engagement
08th February
1st
Preparation
Workshop
20th February
2nd
Preparation
Workshop
08th April
OMC Kick-off
CERN
07th May
OMC Event
Barcelona
23rd May
OMC Event
Stansted
05th June
OMC
Consolidation
CERN
Feedback on the Draft PCP Contract Notice
Open Market Consultation Process
Feedback
Integration
Feedback
Integration
Feedback
Integration
7. Geographical Distribution of Companies
7
Wide geographical distribution
42 companies
Majority from 12 European Countries
8. Cost-effective
Business
Model
Taking into
account:
- Scale
- Ingest rates
- Archive lifetime
- # of copies
- Exit strategies
- Portability
- SLAs
Regulation &
Legislation
- Auditing
- Self-
assessment
- Data Retention
- GDPR
Outcome: R&D Challenge
8
Data integrity/security; cloud/hybrid deployment; data
volume in the PB range; high, sustained ingest data rates;
ISO certification: 27000, 27040, 19086 and related
Archives connected to the GEANT network.
OAIS conformant services: data readability formats, normalization,
obsolesce monitoring, files fixity, authenticity checks, etc.;
ISO 14721/16363, 26324 and related standards
User services: search, discover, share, indexing, data removal, etc.;
Access under Federated AAI
Layer 1
Storage/Basic Archiving/Secure
backup
Layer 2
Preservation
Layer 3
Baseline user services
Layer 4
Advanced
services
High level services: visual representation of data (domain specific),
reproducibility of scientific analyses using Machine Learning
Algorithms, etc.;
Core
R&D
Bonus
R&D
Scientific use cases deployments documented at: https://www.archiver-project.eu/deployment-scenarios
9. 9
High Energy Physics
The BaBar Experiment
During this year, the BaBar Experiment infrastructure at SLAC will be decommissioned. 2 PB of
BaBar data can no longer be stored at the host laboratory. Currently a copy of the data is being
held by CERN IT-ST.
Goal: To ensure that a complete second copy of Babar data will be retained for possible
comparisons with data from other experiments and be shared through the CERN Open Data
Portal.
CERN Open Data Portal
The CERN Open Data portal disseminates close to 2 PBs of primary and derived datasets from
particle physics as they are released by LHC Collaborations and is being used for both education
and research purposes.
Goal: Achieve total reproducibility of research, being able to completely instantiate data,
associated software and services off-premise. Offer research reproducibility services to
individual researchers running open data analyses completely independent from the original on-
premise infrastructure.
CERN Digital Memory
Deployment consisting on a requirement to archive approximately 1.5 PB of digital Memory,
containing analogue documents produced by the Organization in the 20th century as well as
digital production of the 21st century (web sites, social media, emails, etc.)
Goal : Produce a dark archive in the cloud following standard OAIS practices.
10. 10
Life Sciences
EMBL on FIRE
EMBL-EBI provides data archiving services to the global molecular biology community. These
data archives are currently based on an internal service (FIRE: FIle REplication). FIRE currently
holds 20PB of data and is growing at 40% per year.
Goal: cost-effective scaling via cloud-based storage solutions. Distribute data effectively be
on cloud, covering the increasing needs for cloud-hosted analysis.
EMBL Cloud Data Caching
Life sciences research communities access more and more internal data from public cloud
services for their data analysis.
Goal: To progressively cache data in the cloud, with the on-premises data being replicated
and discarded as required. Which data should be cached, how much and for how long, will be
a tradeoff between the cost of cloud storage and of having the network capacity/latency to
download the data multiple times.
Scientific use cases deployments documented at: https://www.archiver-project.eu/deployment-scenarios
11. 11
The MAGIC Cherenkov gamma-ray telescopes and the PAUcam camera for the William
Herschel Telescope are located in the Observatorio del Roque de los Muchachos, in
Canary Islands, Spain. The first Large Scale Telescope of the next-generation Cherenkov
Telescope Array (CTA) is also there. They produce about 0.3 PB of raw data per year
which is automatically sent to PIC in Barcelona.
PIC Large File Storage
Goal: To replace the current in-house tape library storage. Each instance of the service to
be purchased is the 5-year safe-keeping of a yearly dataset from a single source.
PIC Mixed File Remote Storage
Goal: To archive the derived datasets from at most two sources, becoming part of the
yearly dataset. In addition, allow update/upload of derived data sets for a period of 4
years following the creation of the data,
PIC Data Distribution
Goal: To replace the Hierarchical Storage Manager, disk storage and data distribution
service. Each instance of the service to be purchased is the 5-year safe-keeping and data
distribution of a yearly dataset and its derived datasets.
Astronomy
12. 12
Photon Sciences
PETRA III is the worldwide most brilliant storage ring based X-ray sources for high energy photons with 22 beamlines
distributed over three experimental halls. The European XFEL is a world's largest X-ray laser generating 27 000 ultrashort X-
ray per second and with a brilliance that is a billion times higher than that of the best conventional X-ray radiation sources.
The two facilities produce yearly about several 10s PB of raw data and this is expected to double in size every year.
Goal: Develop a hybrid model that combines current on-premise archiving services with the resulting services of ARCHIVER.
To move a predefined set of datasets in public clouds and make them open for public access.
Scientific use cases deployments documented at: https://www.archiver-project.eu/deployment-scenarios
13. 13
Data integrity/security; cloud/hybrid deployment
Data volume in the PB range; high, sustained ingest data
rates. ISO certification: 27000, 27040, 19086 and related
standards. Archives connected to the GEANT network
OAIS conformant services: data readability formats,
normalization, obsolesce monitoring, files fixity, authenticity
checks, etc.
ISO 14721/16363, 26324 and related standards
User services: search, discover, share, indexing, data
removal, etc.
Access under Federated IAM
Layer 1
Storage/Basic
Archiving/Secure backup
Layer 2
Preservation
Layer 3
Baseline user
services
Layer 4
Advanced
services
High level services: visual representation of data (domain
specific), reproducibility of using Machine Learning Algorithms,
etc.;
EMBL1–FIRE
PIC2–MixedFileRemoteStorage
DESY1–PETRAIII/EUXFEL
CERN3–CERNOpenData
CERN2–CERNDigitalMemory
CERN1–TheBaBarExperiment
PIC3–DataDistribution
EMBL2–CloudCaching
PIC1–LargeFileStorage
Definition of the R&D scope Deployments derived from 4 ESFRI landmarks
CTA, ELIXIR, EuXFEL and HL-LHC
Scientific use cases deployments documented at: https://www.archiver-project.eu/deployment-scenarios
16. Role of the EOSC:
To ensure that 1.7 million European researchers and 70 million professionals in
science and technology reap the full benefits of data-driven science
- Federated virtual environment, free at the point of use for the end researcher
- Open services for storage, analysis and re-use of research data
- Promote an approach across national borders & scientific disciplines
- Promote choice of the deployment model: on-prem, hybrid, off-prem
EOSC Phase 1 investment of EUR 300 Million on core services
EOSC legal entity expected to be created by the end of 2020
European Open Science Cloud - The Vision
“We are creating a European Open Science Cloud now. It is a trusted space for researchers to store their data and to
access data from researchers from all other disciplines. We will create a pool of interlinked information, a ‘web of research
data’. (…) The idea is that once we have the rules of the game ready, then we will open this up to the broader public
sector and to business as well. So that companies can come in, store the data and use the data.”
Special Address by Ursula von der Leyen, President of the European Commission, 22nd January, WEF DAVOS 2020
https://www.youtube.com/watch?v=QN476nVbFVs&feature=youtu.be&t=682
17. EOSC should provide a level playing field –
same requirements for commercial and not-for-profit
providers
Accept commercial services in Data Mgmt. Plans
Stay mainstream and interoperable by adopting widely
used and internationally recognized standards
Promote choice, an ecosystem for innovation, fostering
data self-determination and digital sovereignty in Europe
https://www.spielwarenmesse.de
EOSC – Engagement of commercial providers
18. ARCHIVER key contributions for the EOSC
18
Long-Term Archiving and Preservation
of Research Data at the core strategy of
the EOSC
ARCHIVER key in defining EOSC Rules of
Participation for the private sector both
as service providers and as R&D
partners for “close to market” solutions
List of 40+ EOSC projects available at: https://www.eosc-portal.eu/about/eosc-projects
19. ARCHIVER services in the European Open Science Cloud
19
ARCHIVER Services available in the EOSC
2019 2020 2021 2022 2024
Objective: to make resulting services available in the European Open Science Cloud catalogue
What does this really mean?
Atomic Use Case: “As a researcher, I want to have access to the full set of ARCHIVER services, so that,
I’m able to evaluate their functionality for my specific research field, able to purchase them with a clear
cost model, and implement an exit strategy to be able to repatriate or move my research data
seamlessly to another location by the end of the contract and usage period.”
2023
20. ARCHIVER Data Management Strategy
20
• DMPs both for research use cases & for the project
FAIR guiding principles
post-GDPR era: strong focus on technical and organisational measures for
Data Privacy & Protection as the path to digital sovereignty
Modern guidelines provided by Science Europe for pan-European
Research Data Management (RDM)
Establishing core requirements for Research DMPs & a set of
criteria to assess trustworthy repositories:
https://www.scienceeurope.org/our-resources/practical-guide-to-the-international-alignment-of-
research-data-management/
21. ARCHIVER EOSC Services - Technical Validation (I)
21
Resource provisioning using Terraform API
OSS container orchestration systems
Automated deployment based on Kubernetes and Docker
Results stored back at CERN S3 cloud storage service
Available at Github under an OSS license:
https://ocre-testsuite.readthedocs.io/en/latest/
Tests provided by the research community
https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/
Started in HNSciCloud
in use already in OCRE
reused and expanded in ARCHIVER
22. ARCHIVER EOSC Services – Legal & Organisational (II)
22
Data Privacy and Protection (GDPR) – An EC pillar
Personal Data wrapped in several components (Fed AAI, Research Data itself)
Technical & Organisational measures: “Privacy by Design” approach for GDPR
conformance
Best Practices: Standards
Infrastructure: ISO 27001 series, European Cybersecurity Act
Long Term Data Preservation: OAIS and CoreTrustSeal
Self-Assessment: Definition of responsibilities across data stewards and service
providers
Exit Strategies
Stimulate the use of Open Source, Open APIs: measures for vendor lock-in prevention
Definition and field testing of viable exit plans (provider/on-prem & provider/provider)
23. EOSC Services – Financial & Business Model (III)
23
Establish a range of sustainable, cost-effective purchasing options
Must support organisations or individual researchers to store their data after
the end of a procurement cycle or research grant of individual researcher
Requirement to service providers to establish a “Total Cost of Services” study
From the architecture phase (Design) to Prototype and Pilot
“Total” TCS: must include all factors that a research organisation or individual
researcher will bear when running the ARCHIVER resulting services over a
defined period
Strongly connected to exit strategies
Escrow services concept: public research organisations as data stewards
Protection factor against scenarios such as vendor locks or supplier bankruptcy
24. Summary
24
ARCHIVER aims to develop a set of commercial, FAIR, archiving and preservation services for
research data
Petabyte research data (tens of petabytes and beyond) in multiple scientific domains
Open, Trustworthy, aligned with best practices (ISO, OAIS, CoreTrustSeal)
Strong preference by Open Source Software, Open Standards as measures to prevent vendor lock-ins
Set of “derived rules” for commercial services onboarding in the EOSC
Technical: extensive field testing, “research data ready” archiving and preservation services
Legal: GDPR as an opportunity for high quality digital services guaranteeing digital sovereignty
Financial: models adapted to research considering public procurement cycles and research grants periods,
allowing effective cost planning for LTDP
ARCHIVER R&D activity will start end of Q1 2020
Tender for R&D services to open in less than 48h! for submission of R&D bids from the 31st of January
(submission period of 2 months)
Info Session Webinars: February 7th and March 18th
Selected companies will be providing R&D services, 3 phases: Design (2020), Prototype (2020/2021) and Pilot
(2021)
27. Synergies with CS3Mesh4EOSC (CS3Mesh Kick-off)
• Test Suite
• Maybe CS3Mesh can profit from the ARCHIVER test suite available in Github
• https://github.com/cern-it-efp/OCRE-Testsuite
• A simple framework using Terraform for resource provisioning; Kubernetes for
resource orchestration as a standard way to embark scientific use cases tests
• APIs
• Would it make sense to define and test together the APIs implementation that
will be available in ARCHIVER resulting repository services?
• Preference is give to open, general purpose APIs
• Testing data workflows from CS3 -> ARCHIVER based on “data temperature”?
27
28. ARCHIVER Test Catalogue
28
Initial test catalogue include tests on network, storage and compute:
https://github.com/cern-it-efp/OCRE-Testsuite/
More tests being included:
FAIRsFAIR project evaluation of ”FAIRness”:
https://fairsharing.github.io/FAIR-Evaluator-FrontEnd/#!/#%2F!
Tests provided by the research community; using Science Europe assessment criteria
guidelines
Data ingestion: test of the ingest process, ingests with incremental changes for high
volumes of data, lifecycle from archive packages data creation, combination and/or
aggregation before final archival for long-term preservation
Open APIs: prevent vendor lock-ins and create innovative workflows (CS3MESH?)
Extensive testing of exit plans: during R&D execution, from provider2on-prem /
provider2provider