eosc-hub.eu
@EOSC_eu
EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536.
EUDAT B2SAFE
2
Motivation and driving consideration about the service
Service architecture and interfaces: overview
- How the user can access the service
 E.g.: REST, GUI, CLIs, etc.
- Service options and attributes
Acceptable Usage Policy (AUP)
Use cases
Documentation/tutorial/information
3/5/2019
Content
3
A research community wants to improve the services offered to its users
granting that:
- their data will still be available after years;
- the data are easily accessible from a researcher with just a browser as well as from a
data manager who needs to transfer massive amounts of data;
- the data are easily discoverable through a well defined set of metadata attributes
and tools;
- the data can be moved to computing resources when needed and back;
- those improvements will not disrupt the user workflows because they will be
inplemented in a transparent way through a seamless integration with the current
community services, which will enforce the authorization policies defined by the
community.
3/5/2019
Motivation
Data planning
Long term data preservation
Data curation
Data access
Data discovery
Well defined API and protocols
Data transfer
Distributed architecture 1
EUDAT infrastructure (CDI)
• Different administrative
domains
• We need to federate
them to offer a common
user management
Easy
access
Distributed architecture 2
Storage systems
User interfaces
Federation
Architecture 1
EUDAT has built an additional layer on top of iRODS to
streamline the processes which supports the replication and
long term data archiving.
iRODS +
EUDAT B2SAFE package +
back-end storage =
B2SAFE service
http://eudat.eu/services/userdoc/configure-b2safe
https://github.com/EUDAT-B2SAFE/B2SAFE-core
Architecture 2
EUDAT B2SAFE package
EUDAT B2SAFE package = rules + scripts
93/5/2019
B2SAFE package rules and scripts
103/5/2019
B2SAFE Data Policy Manager
architecture
DPM
Interfaces towards other services
DPM
metadata
PID catalog
123/5/2019
Interfaces towards other services:
data flow 1
thanks to www.vecteezy.com for the pictures
Community data
Policies: data are stored according to
the rules defined by the community
data are identified
data are registereddata are made discoverable
data can be easily retrieved
data can be easily moved
data are secured
1234
1234
1234
133/5/2019
Interfaces towards other services:
data flow 2
Data are stored according to the
rules defined by the community
Data are identified
Data are made
discoverable
Data are registered
Data can be easily retrieved
Data are secured
Data can be easily moved
A set of EUDAT rules is defined: they
implement the most common data flows.
Community specific rules are added when needed
Longtermdatapreservation
Persistent Identifiers (PIDs) are associated to the data and
registered in the B2HANDLE service
Persistent Identifiers (PIDs) are globally resolvable, they can
be used in B2SHARE and B2STAGE services
Data are replicated according to the defined policy across
different nodes of the EUDAT CDI, making them tolerant to
single node failures and single copy corruption
HTTP API and GridFTP allow to download and upload
data using standard protocols.
Data
discovery
Data
transfer
143/5/2019
Interfaces towards other services:
Data Policy Manager 1
DPM
Definition of policies for data
management
Policies life cycle
management
Policies translation
Policies enforcement
User
authentication
?
Data
manager
Resource
provider
Resource provider
feedback
153/5/2019
Interfaces towards other services:
Data Policy Manager 2
DPM relies on B2ACCESS for the authentication
through the Shibboleth protocol
Policies are implemented as XML documents which
can be created through a web portal
The B2SAFE rules are scheduled according to the
policy trigger and executed by the rule engine
The policies, described in high level language, are
translated into B2SAFE rules
The status of the policy is reported back to the data
manager. It can be waiting in a queue, enforced,
rejected by the resource provider or completed
User authentication
Definition of policies for
data management
Policies translation
Policies enforcement
Resource provider
feedback
Datacuration
Policies life cycle
management
Policies are stored in an XML DB and identified through a
unique id. They can be modified and removed
iRODS icommands: it is a set of CLI commands which can be
deployed through RPM or DEB packages.
(https://irods.org/download)
Davrods: it is webDAV interface on top of iRODS.
(https://github.com/UtrechtUniversity/davrods)
The B2STAGE service offers two interfaces for B2SAFE:
- The GridFTP iRODS-DSI to enable fast data transfer through the GridFTP
protocol;
- HTTP API interface to furnish a RESTful interface towards EUDAT
services.
How to access the service 1
The GridFTP iRODS-DSI
● DSI (Data Storage Interface): GridFTP can be extended to
support different underlying storage system
● Implemented making use of the iRODS C API
● Supports the main iRODS operations (get, put, delete, list,
checksum calculation)
UberFTP
Globus Online
globus-url-copy
WebFTS
FTS3 Rest CLI
data
The GridFTP iRODS-DSI allows users to manage
data on EUDAT nodes (B2SAFE) through any
standard GridFTP client
183/5/2019
HTTP API
User is authenticating
with username/password
Upload
Download
Oauth2: HTTP API get a oauth2 token from
B2ACCESS and provides an api token to the
user
data are streamed from the http
client to b2safe, avoiding to cache
them at the HTTP API server
B2SAFE validates the
oauth2 token and gets
user attributes to map
the user on a local
account
HTTP API talks with
B2SAFE on behalf of
the user, using the
oauth2 token
data are streamed from b2safe,to the http client, avoiding
to cache them at the HTTP API server
193/5/2019
HTTP API authentication
User claudio is authenticating with username/password
A new file is ready to be uploaded
203/5/2019
HTTP API upload / download
Upload
Download
213/5/2019
DPM web portal: policy editor
22
http://hdl.handle.net/11304/e43b2e3f-83c5-4e3f-b8b7-18d38d37a6cd
3/5/2019
Acceptable Usage Policy
233/5/2019
Featured use cases
Use cases
CLARIN
https://www.eudat.eu/communities/common-language-resources-and-
technology-infrastructure
ClimateModel
https://www.eudat.eu/communities/support-to-scientific-research-on-
seasonal-to-decadal-climate-and-air-quality-modelling
EISCAT https://www.eudat.eu/communities/unified-access-to-eiscat-radar-data
EPOS https://www.eudat.eu/communities/european-plate-observing-system
Herbadrop
https://www.eudat.eu/communities/long-term-preservation-of-herbarium-
specimen-images
IST
https://www.eudat.eu/communities/eudat-services-to-guarantee-long-
time-archiving-and-visibility-to-the-repository-of-ist
VPH https://www.eudat.eu/communities/virtual-humans
SDC https://www.seadatanet.org/About-us/SeaDataCloud
24
The SeaDataNet portal (CDI: Common Data Index) collects only part of
the data produced by more than one hundreds of marine research
institutions.
The others are stored locally from the institutions and offered to the
users after a request via email. They are made accessible via a
temporary web service endpoint.
The quality checks are performed by the local institutions, without any
central mechanism, therefore the risk of inconsistencies and
duplications is high.
There is not a Virtual Research Environment, but a set of desktop and
web applications , independent from each other. The user is forced to
upload the data set that she wants to analyze and to download the
result: there is not a shared data space, neither there is a personal one.
3/5/2019
SeaDataCloud: the challenge
253/5/2019
SeaDataCloud: b2safe and b2stage
B2HANDLE
26
B2SAFE and B2STAGE services are hidden behind the community web portal (CDI) which takes care to
manage user and community specific metadata registration (DATA DISCOVERY).
Each of the five EUDAT data centers offers a B2SAFE instance federated with the others.
Each data center provides two storage areas:
- one for the ingestion of the new data uploaded by the data producers, which are the hundreds of marine
science institutions of SeaDataNet (DATA TRANSFER);
- one for the production ready data, which have been validated by the data manager through the community
web portal.
The community web portal triggers quality check workflows on the B2SAFE and B2HOST side (DATA
PLANNING, DATA CURATION).
Once moved into the production area, the data are replicated following a star pattern: each replica has
the same master copy. And a B2HANDLE PID is associated to them (LONG TERM DATA PRESERVATION)
Data can then be shared with applications running on the B2HOST environment (DATA TRANSFER)
3/5/2019
SeaDataCloud: the solution
273/5/2019
SeaDataCloud: b2safe solution
B2HANDLE
Data planning
SDC community web
portal
Data discovery
Data
access
Data
transfer
Data transfer
Long term data preservation
Data curation
28
https://eudat.eu/services/userdoc/b2safe
https://github.com/EUDAT-Training/B2SAFE-B2STAGE-
Training
https://github.com/EUDAT-B2SAFE/B2SAFE-core/wiki
https://github.com/EUDAT-B2SAFE/B2SAFE-DPM/wiki
3/5/2019
Documentations
eosc-hub.eu @EOSC_eu

EUDAT B2SAFE & EOSC-hub

  • 1.
    eosc-hub.eu @EOSC_eu EOSC-hub receives fundingfrom the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536. EUDAT B2SAFE
  • 2.
    2 Motivation and drivingconsideration about the service Service architecture and interfaces: overview - How the user can access the service  E.g.: REST, GUI, CLIs, etc. - Service options and attributes Acceptable Usage Policy (AUP) Use cases Documentation/tutorial/information 3/5/2019 Content
  • 3.
    3 A research communitywants to improve the services offered to its users granting that: - their data will still be available after years; - the data are easily accessible from a researcher with just a browser as well as from a data manager who needs to transfer massive amounts of data; - the data are easily discoverable through a well defined set of metadata attributes and tools; - the data can be moved to computing resources when needed and back; - those improvements will not disrupt the user workflows because they will be inplemented in a transparent way through a seamless integration with the current community services, which will enforce the authorization policies defined by the community. 3/5/2019 Motivation Data planning Long term data preservation Data curation Data access Data discovery Well defined API and protocols Data transfer
  • 4.
    Distributed architecture 1 EUDATinfrastructure (CDI) • Different administrative domains • We need to federate them to offer a common user management Easy access
  • 5.
    Distributed architecture 2 Storagesystems User interfaces Federation
  • 6.
  • 7.
    EUDAT has builtan additional layer on top of iRODS to streamline the processes which supports the replication and long term data archiving. iRODS + EUDAT B2SAFE package + back-end storage = B2SAFE service http://eudat.eu/services/userdoc/configure-b2safe https://github.com/EUDAT-B2SAFE/B2SAFE-core Architecture 2
  • 8.
    EUDAT B2SAFE package EUDATB2SAFE package = rules + scripts
  • 9.
  • 10.
    103/5/2019 B2SAFE Data PolicyManager architecture DPM
  • 11.
    Interfaces towards otherservices DPM metadata PID catalog
  • 12.
    123/5/2019 Interfaces towards otherservices: data flow 1 thanks to www.vecteezy.com for the pictures Community data Policies: data are stored according to the rules defined by the community data are identified data are registereddata are made discoverable data can be easily retrieved data can be easily moved data are secured 1234 1234 1234
  • 13.
    133/5/2019 Interfaces towards otherservices: data flow 2 Data are stored according to the rules defined by the community Data are identified Data are made discoverable Data are registered Data can be easily retrieved Data are secured Data can be easily moved A set of EUDAT rules is defined: they implement the most common data flows. Community specific rules are added when needed Longtermdatapreservation Persistent Identifiers (PIDs) are associated to the data and registered in the B2HANDLE service Persistent Identifiers (PIDs) are globally resolvable, they can be used in B2SHARE and B2STAGE services Data are replicated according to the defined policy across different nodes of the EUDAT CDI, making them tolerant to single node failures and single copy corruption HTTP API and GridFTP allow to download and upload data using standard protocols. Data discovery Data transfer
  • 14.
    143/5/2019 Interfaces towards otherservices: Data Policy Manager 1 DPM Definition of policies for data management Policies life cycle management Policies translation Policies enforcement User authentication ? Data manager Resource provider Resource provider feedback
  • 15.
    153/5/2019 Interfaces towards otherservices: Data Policy Manager 2 DPM relies on B2ACCESS for the authentication through the Shibboleth protocol Policies are implemented as XML documents which can be created through a web portal The B2SAFE rules are scheduled according to the policy trigger and executed by the rule engine The policies, described in high level language, are translated into B2SAFE rules The status of the policy is reported back to the data manager. It can be waiting in a queue, enforced, rejected by the resource provider or completed User authentication Definition of policies for data management Policies translation Policies enforcement Resource provider feedback Datacuration Policies life cycle management Policies are stored in an XML DB and identified through a unique id. They can be modified and removed
  • 16.
    iRODS icommands: itis a set of CLI commands which can be deployed through RPM or DEB packages. (https://irods.org/download) Davrods: it is webDAV interface on top of iRODS. (https://github.com/UtrechtUniversity/davrods) The B2STAGE service offers two interfaces for B2SAFE: - The GridFTP iRODS-DSI to enable fast data transfer through the GridFTP protocol; - HTTP API interface to furnish a RESTful interface towards EUDAT services. How to access the service 1
  • 17.
    The GridFTP iRODS-DSI ●DSI (Data Storage Interface): GridFTP can be extended to support different underlying storage system ● Implemented making use of the iRODS C API ● Supports the main iRODS operations (get, put, delete, list, checksum calculation) UberFTP Globus Online globus-url-copy WebFTS FTS3 Rest CLI data The GridFTP iRODS-DSI allows users to manage data on EUDAT nodes (B2SAFE) through any standard GridFTP client
  • 18.
    183/5/2019 HTTP API User isauthenticating with username/password Upload Download Oauth2: HTTP API get a oauth2 token from B2ACCESS and provides an api token to the user data are streamed from the http client to b2safe, avoiding to cache them at the HTTP API server B2SAFE validates the oauth2 token and gets user attributes to map the user on a local account HTTP API talks with B2SAFE on behalf of the user, using the oauth2 token data are streamed from b2safe,to the http client, avoiding to cache them at the HTTP API server
  • 19.
    193/5/2019 HTTP API authentication Userclaudio is authenticating with username/password A new file is ready to be uploaded
  • 20.
    203/5/2019 HTTP API upload/ download Upload Download
  • 21.
  • 22.
  • 23.
    233/5/2019 Featured use cases Usecases CLARIN https://www.eudat.eu/communities/common-language-resources-and- technology-infrastructure ClimateModel https://www.eudat.eu/communities/support-to-scientific-research-on- seasonal-to-decadal-climate-and-air-quality-modelling EISCAT https://www.eudat.eu/communities/unified-access-to-eiscat-radar-data EPOS https://www.eudat.eu/communities/european-plate-observing-system Herbadrop https://www.eudat.eu/communities/long-term-preservation-of-herbarium- specimen-images IST https://www.eudat.eu/communities/eudat-services-to-guarantee-long- time-archiving-and-visibility-to-the-repository-of-ist VPH https://www.eudat.eu/communities/virtual-humans SDC https://www.seadatanet.org/About-us/SeaDataCloud
  • 24.
    24 The SeaDataNet portal(CDI: Common Data Index) collects only part of the data produced by more than one hundreds of marine research institutions. The others are stored locally from the institutions and offered to the users after a request via email. They are made accessible via a temporary web service endpoint. The quality checks are performed by the local institutions, without any central mechanism, therefore the risk of inconsistencies and duplications is high. There is not a Virtual Research Environment, but a set of desktop and web applications , independent from each other. The user is forced to upload the data set that she wants to analyze and to download the result: there is not a shared data space, neither there is a personal one. 3/5/2019 SeaDataCloud: the challenge
  • 25.
  • 26.
    26 B2SAFE and B2STAGEservices are hidden behind the community web portal (CDI) which takes care to manage user and community specific metadata registration (DATA DISCOVERY). Each of the five EUDAT data centers offers a B2SAFE instance federated with the others. Each data center provides two storage areas: - one for the ingestion of the new data uploaded by the data producers, which are the hundreds of marine science institutions of SeaDataNet (DATA TRANSFER); - one for the production ready data, which have been validated by the data manager through the community web portal. The community web portal triggers quality check workflows on the B2SAFE and B2HOST side (DATA PLANNING, DATA CURATION). Once moved into the production area, the data are replicated following a star pattern: each replica has the same master copy. And a B2HANDLE PID is associated to them (LONG TERM DATA PRESERVATION) Data can then be shared with applications running on the B2HOST environment (DATA TRANSFER) 3/5/2019 SeaDataCloud: the solution
  • 27.
    273/5/2019 SeaDataCloud: b2safe solution B2HANDLE Dataplanning SDC community web portal Data discovery Data access Data transfer Data transfer Long term data preservation Data curation
  • 28.
  • 29.