Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on


Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. @EOSC_eu EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536. EUDAT B2SAFE
  2. 2. 2 Motivation and driving consideration about the service Service architecture and interfaces: overview - How the user can access the service  E.g.: REST, GUI, CLIs, etc. - Service options and attributes Acceptable Usage Policy (AUP) Use cases Documentation/tutorial/information 3/5/2019 Content
  3. 3. 3 A research community wants to improve the services offered to its users granting that: - their data will still be available after years; - the data are easily accessible from a researcher with just a browser as well as from a data manager who needs to transfer massive amounts of data; - the data are easily discoverable through a well defined set of metadata attributes and tools; - the data can be moved to computing resources when needed and back; - those improvements will not disrupt the user workflows because they will be inplemented in a transparent way through a seamless integration with the current community services, which will enforce the authorization policies defined by the community. 3/5/2019 Motivation Data planning Long term data preservation Data curation Data access Data discovery Well defined API and protocols Data transfer
  4. 4. Distributed architecture 1 EUDAT infrastructure (CDI) • Different administrative domains • We need to federate them to offer a common user management Easy access
  5. 5. Distributed architecture 2 Storage systems User interfaces Federation
  6. 6. Architecture 1
  7. 7. EUDAT has built an additional layer on top of iRODS to streamline the processes which supports the replication and long term data archiving. iRODS + EUDAT B2SAFE package + back-end storage = B2SAFE service Architecture 2
  8. 8. EUDAT B2SAFE package EUDAT B2SAFE package = rules + scripts
  9. 9. 93/5/2019 B2SAFE package rules and scripts
  10. 10. 103/5/2019 B2SAFE Data Policy Manager architecture DPM
  11. 11. Interfaces towards other services DPM metadata PID catalog
  12. 12. 123/5/2019 Interfaces towards other services: data flow 1 thanks to for the pictures Community data Policies: data are stored according to the rules defined by the community data are identified data are registereddata are made discoverable data can be easily retrieved data can be easily moved data are secured 1234 1234 1234
  13. 13. 133/5/2019 Interfaces towards other services: data flow 2 Data are stored according to the rules defined by the community Data are identified Data are made discoverable Data are registered Data can be easily retrieved Data are secured Data can be easily moved A set of EUDAT rules is defined: they implement the most common data flows. Community specific rules are added when needed Longtermdatapreservation Persistent Identifiers (PIDs) are associated to the data and registered in the B2HANDLE service Persistent Identifiers (PIDs) are globally resolvable, they can be used in B2SHARE and B2STAGE services Data are replicated according to the defined policy across different nodes of the EUDAT CDI, making them tolerant to single node failures and single copy corruption HTTP API and GridFTP allow to download and upload data using standard protocols. Data discovery Data transfer
  14. 14. 143/5/2019 Interfaces towards other services: Data Policy Manager 1 DPM Definition of policies for data management Policies life cycle management Policies translation Policies enforcement User authentication ? Data manager Resource provider Resource provider feedback
  15. 15. 153/5/2019 Interfaces towards other services: Data Policy Manager 2 DPM relies on B2ACCESS for the authentication through the Shibboleth protocol Policies are implemented as XML documents which can be created through a web portal The B2SAFE rules are scheduled according to the policy trigger and executed by the rule engine The policies, described in high level language, are translated into B2SAFE rules The status of the policy is reported back to the data manager. It can be waiting in a queue, enforced, rejected by the resource provider or completed User authentication Definition of policies for data management Policies translation Policies enforcement Resource provider feedback Datacuration Policies life cycle management Policies are stored in an XML DB and identified through a unique id. They can be modified and removed
  16. 16. iRODS icommands: it is a set of CLI commands which can be deployed through RPM or DEB packages. ( Davrods: it is webDAV interface on top of iRODS. ( The B2STAGE service offers two interfaces for B2SAFE: - The GridFTP iRODS-DSI to enable fast data transfer through the GridFTP protocol; - HTTP API interface to furnish a RESTful interface towards EUDAT services. How to access the service 1
  17. 17. The GridFTP iRODS-DSI ● DSI (Data Storage Interface): GridFTP can be extended to support different underlying storage system ● Implemented making use of the iRODS C API ● Supports the main iRODS operations (get, put, delete, list, checksum calculation) UberFTP Globus Online globus-url-copy WebFTS FTS3 Rest CLI data The GridFTP iRODS-DSI allows users to manage data on EUDAT nodes (B2SAFE) through any standard GridFTP client
  18. 18. 183/5/2019 HTTP API User is authenticating with username/password Upload Download Oauth2: HTTP API get a oauth2 token from B2ACCESS and provides an api token to the user data are streamed from the http client to b2safe, avoiding to cache them at the HTTP API server B2SAFE validates the oauth2 token and gets user attributes to map the user on a local account HTTP API talks with B2SAFE on behalf of the user, using the oauth2 token data are streamed from b2safe,to the http client, avoiding to cache them at the HTTP API server
  19. 19. 193/5/2019 HTTP API authentication User claudio is authenticating with username/password A new file is ready to be uploaded
  20. 20. 203/5/2019 HTTP API upload / download Upload Download
  21. 21. 213/5/2019 DPM web portal: policy editor
  22. 22. 22 3/5/2019 Acceptable Usage Policy
  23. 23. 233/5/2019 Featured use cases Use cases CLARIN technology-infrastructure ClimateModel seasonal-to-decadal-climate-and-air-quality-modelling EISCAT EPOS Herbadrop specimen-images IST time-archiving-and-visibility-to-the-repository-of-ist VPH SDC
  24. 24. 24 The SeaDataNet portal (CDI: Common Data Index) collects only part of the data produced by more than one hundreds of marine research institutions. The others are stored locally from the institutions and offered to the users after a request via email. They are made accessible via a temporary web service endpoint. The quality checks are performed by the local institutions, without any central mechanism, therefore the risk of inconsistencies and duplications is high. There is not a Virtual Research Environment, but a set of desktop and web applications , independent from each other. The user is forced to upload the data set that she wants to analyze and to download the result: there is not a shared data space, neither there is a personal one. 3/5/2019 SeaDataCloud: the challenge
  25. 25. 253/5/2019 SeaDataCloud: b2safe and b2stage B2HANDLE
  26. 26. 26 B2SAFE and B2STAGE services are hidden behind the community web portal (CDI) which takes care to manage user and community specific metadata registration (DATA DISCOVERY). Each of the five EUDAT data centers offers a B2SAFE instance federated with the others. Each data center provides two storage areas: - one for the ingestion of the new data uploaded by the data producers, which are the hundreds of marine science institutions of SeaDataNet (DATA TRANSFER); - one for the production ready data, which have been validated by the data manager through the community web portal. The community web portal triggers quality check workflows on the B2SAFE and B2HOST side (DATA PLANNING, DATA CURATION). Once moved into the production area, the data are replicated following a star pattern: each replica has the same master copy. And a B2HANDLE PID is associated to them (LONG TERM DATA PRESERVATION) Data can then be shared with applications running on the B2HOST environment (DATA TRANSFER) 3/5/2019 SeaDataCloud: the solution
  27. 27. 273/5/2019 SeaDataCloud: b2safe solution B2HANDLE Data planning SDC community web portal Data discovery Data access Data transfer Data transfer Long term data preservation Data curation
  28. 28. 28 Training 3/5/2019 Documentations
  29. 29. @EOSC_eu