Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Summary of the Deployment Scenarios and Functional Requirements

285 views

Published on

Presentation by Vaggelis Motesnitsalis , CERN

Published in: Software
  • Be the first to comment

  • Be the first to like this

Summary of the Deployment Scenarios and Functional Requirements

  1. 1. Summary of the Deployment Scenarios and Functional Requirements Evangelos Motesnitsalis Technical Coordinator ARCHIVER Consolidation Event 5 June 2019
  2. 2. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 2 Contents Recap Common Characteristics Service Layers Mapping Testing plans Summary and Next Steps
  3. 3. Recap of Deployment Scenarios
  4. 4. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 4 High Energy Physics Deployment Scenarios The BaBar Experiment In 2020 the BaBar Experiment infrastructure at SLAC will be decommissioned. As a result, the 2 PB of BaBar data can no longer be stored at the host laboratory and alternative solutions need to be found. Currently a copy of the data is being held by CERN IT. We want to ensure that a complete copy of Babar data will be retained for possible comparisons with data from other experiments and sharing through the CERN Open Data Portal. CERN Open Data Portal The CERN Open Data portal disseminates close to 2 PBs of primary and derived datasets from partical physics as they were released by LHC Collaborations and is being used for both education and research purposes. The CERN Open Data Service Managers seek an easy-to-use, easy-to- achieve independent archiving and backup for its holdings based on SIPs [Submission Information Packages] with intelligent and reliable disaster recovery mechanisms. CERN Digital Memory We want to archive the ~1.5 PB of CERN Digital Memory, containing digitized analog documents produced by the institution in the 20th century as well as the digital production of the 21st century, including new types like web sites, social medias, emails, etc.
  5. 5. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 5 Life Sciences Deployment Scenarios EMBL on FIRE EMBL-EBI provides data archiving services to the global molecular biology community. These data archives are currently based on an internal service (FIRE: FIle REplication) that stores the files in two different systems: a distributed object store and tape. FIRE currently holds 20PB of data and is growing at 40% per year. We want to ensure that: FIRE can achieve cost-effective scaling via cloud-based storage solutions Data can effectively be distributed on cloud infrastructure, covering the increasing needs for cloud-hosted analysis EMBL Cloud Data Caching As research communities access more and more of internal data from cloud services for their data analysis, it makes sense to progressively cache data in the cloud, with the on-premises data being replicated and discarded as required. Which data should be cached, how much and for how long, will be a tradeoff between the cost of cloud storage and of having the network capacity/latency to download the data multiple times.
  6. 6. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 6 The MAGIC Cherenkov gamma-ray telescopes and the PAUcam camera for the William Herschel Telescope are located in the Observatorio del Roque de los Muchachos, in Canary Islands, Spain. The first Large Scale Telescope of the next-generation Cherenkov Telescope Array (CTA) is also there. They produce about 0.3 PB of raw data per year which is automatically sent to PIC in Barcelona. PIC Large File Storage We want to substitute the current in-house tape library storage. Each instance of the service to be purchased is the 5-year safe-keeping of a yearly dataset from a single source. PIC Mixed File Remote Storage We also want to be able to archive the derived datasets from at most two sources, becoming part of the yearly dataset. In addition, anytime during the 4 years following the creation of the data, additional versions of derived datasets may need to be uploaded. PIC Data Distribution We also want to substitute the Hierarchical Storage Manager, disk storage and data distribution service. Each instance of the service to be purchased is the 5-year safe-keeping and data distribution of a yearly dataset and its derived datasets. Astronomy Deployment Scenarios
  7. 7. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 7 Photon Science Deployment Scenarios PETRA III is the worldwide most brilliant storage ring based X-ray sources for high energy photons with 22 beamlines distributed over three experimental halls are concurrently available for users. The European XFEL is a world's largest X-ray laser generating 27 000 ultrashort X-ray per second and with a brilliance that is a billion times higher than that of the best conventional X-ray radiation sources. PETRA III /EuXFEL – Individual Scientist Individual scientist at DESY need a service to create archives for their experiment data as well as their publications with specific capabilities such as data ingestion via browser or third-party copies. PETRA III /EuXFEL – Manual Data Archiving Experiment managers want to be able to create/manage/delete archives via APIs/CLIs based on accepted data policies supporting a wide range of options for cloud and on-prem storage, while being able to utilize existing user credentials, authentication techniques and identification mechanisms. PETRA III /EuXFEL – Integrated Data Archiving Long-lived collaborations present a growing need to plan and execute archiving operations in a fully automated, policy-based, certified, and documented way, based on APIs.
  8. 8. Common Characteristics
  9. 9. Summary of the Deployment Scenarios and Functional Requirements 9 FAIR Principles Findable AccessibleInteroperable Re-Usable • Accurate and relevant description • Data usage license and detailed provenance • Retrievable with free protocols • Accessible metadata even after deletion • Global, unique identifiers • Rich Metadata, indexes, search capabilities • Qualified reference to other data • Formal, shared and broadly applicable knowledge representation standards https://www.go-fair.org/ 5 June 2019
  10. 10. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 10 OAIS Reference Model
  11. 11. Common Characteristics 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 11 Scientific Data Storage in the PB Range Solid needs for Federated AAI Services Sustained Data Ingest Rates Access to GEANT Network Development under the OAIS Reference Model and FAIR Principles Data Privacy and Compliance Significant Monitoring Requirements Sustainable Business Models and Costs
  12. 12. Service Layers Mapping
  13. 13. Service Layers and Deployment Scenarios Mappings 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 13 Data integrity/security; cloud/hybrid deployment Data volume in the PB range; high, sustained ingest data rates ISO certification: 27000, 27040, 19086 and related standards Archives connected to the GEANT network OAIS conformant services: data readability formats, normalization, obsolesce monitoring, files fixity, authenticity checks, etc. ISO 14721/16393, 26324 and related standards User services: search, discover, share, indexing, data removal, etc. Access under Federated IAM Layer 1 Storage/Basic Archiving/Secure backup Layer 2 Preservation Layer 3 Baseline user services Layer 4 Advanced services High level services: visual representation of data (domain specific), reproducibility of scientific analyses, etc. EMBL1–FIRE PIC2–MixedFileRemoteStorage DESY1–PETRAIII/EUXFEL CERN3–CERNOpenData CERN2–CERNDigitalMemory CERN1–TheBaBarExperiment PIC3–DataDistribution EMBL2–CloudCaching PIC1–LargeFileStorage
  14. 14. Testing Plans
  15. 15. Testing Plans 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 15 The Buyers Group will request demo access to the current product offerings during the Design Phase. Testing will focus on Functionality for the Prototype Phase and Performance, Scalability, and Reliability for the Pilot Phase. The Buyers Group will provide a set of tests derived from the Buyers Group deployment scenarios and the Functional Specifications. The tests will have clear assessment criteria for pass/fail. The Buyers Group expects to deploy tests only after a clear indication of the contractor that the tests were run successfully by the contractor themselves. We plan to present the initial set of tests by the Design Phase Kick-off. Assessment of the tests results will have implications on the assessment of the respective phase results and on the payments to be executed.
  16. 16. Basic Functionality Testing Examples 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 16 Ingestion: ability to submit a particular dataset of X size to the Archiving Service within time Y Access: ability to recall a particular part of a file, file or dataset within time Y Monitoring and Dashboard: ability to access displayed informations via web browser and trigger basic management function e.g. data deletion, fixity checks, etc. Audit and Log: ability to access detailed access logs for a particular file/dataset
  17. 17. Summary
  18. 18. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 18 Overview C3 – CERN Open Data C1 – The BaBar Experiment C2 – CERN Digital Memory P1 – Large File Remote Storage P3 – Data Distribution P2 – Mixed File Remote Storage E1 – FIRE E2 – Cloud Caching D1 – PETRA III / EUXFEL
  19. 19. 5 June 2019 Summary of the Deployment Scenarios and Functional Requirements 19 Summary and Next Steps The primary goal for all the Deployment Scenarios is the preservation and long-term archiving of data in the PB range with high sustained ingest rates for complex data types. If this can be achieved easily, all the scenarios would benefit greatly from an added Software Reproducability and Open Data Distribution Layer on top of the archiving solution. These deployment scenarios exhibit many similarities such as the scientific complex data types, the need for federated AAI services, the significant monitoring requirements, and the development under OAIS and FAIR. We welcome your feedback on the draft of the “Functional Specifications” documents until 14 June. The Buyers group will co-design and co-develop with you a test plan: The plan will be based on the outcome of the Design Phase, the Functional Specifications document, and the Deployment Scenarios needs The test assessment will be a deciding factor to qualify solutions to the subsequent phases The tests will focus on basic functionality capabilities during the prototype phase and performance, efficiency, and scalability during the pilot phase

×