This presentation briefly introduces the B2SHARE data store and publication service of EUDAT.
B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store and publish research data coming from diverse contexts.
B2SHARE is part of the EUDAT CDI and is directly connected to the B2DROP and B2SAFE services for data import and export.
Other services like B2HANDLE are used to store persistent identifiers. Additional annotation like provenance data is stored using B2NOTE. B2ACCESS is required for authentication and authorization in the future.
The B2FIND service is used for metadata harvesting so that your records can be found in the more general search service of EUDAT.
B2SHARE has web-based GUI, but it also offers a REST API. The B2SHARE API is a set of instructions to interact with a B2SHARE service instance, for example the one of EUDAT. It provides direct interaction with the service without using a graphical user interface. Instead a command line tool or an application that integrates the communication is used. For example, the tools used in a research workflow might include automatic upload of final data in order to directly publish it. Most programming language have built-in functionality to interact with a remote service using HTTP requests.
The B2SHARE REST API can be used by any user without registration and without limitations. The only functionality that requires registration is the creation and modification of records and defined communities.
Using the B2SHARE API, the user can do several important operations such as listing all existing records and communities that are currently defined in the service. When a user is looking for specific records and/or communities the query functionality can be used and filtered using several parameters.
Once a community has been chosen, detailed community-specific information, including community metadata schema definitions and other requirements can be retrieved.
New draft records can be created, including the upload of files and the addition of metadata in accordance to the metadata schema defined by a community or project. Once a draft record is completed, it can be published. Furthermore, existing published records can be modified in terms of metadata and published as a new versioned record. The old version will always be available and citable.
The next three slides may be common knowledge, so we can skip them if everybody knows about these. API stands for Application Programming Interface, and is a specification of a set of definitions, protocols and tools which allow to interact with a service, possibly from a remote location. An API provides an abstraction layer of the underlying service technologies used by the service itself and external applications to communicate with the technology without exactly knowing how it works. This allows machines to easily interact with the service and correctly handle responses and possible errors.
When it comes to APIs for the Internet, communication is done through HTTP requests which return machine-readable structures of data representing the current state of a piece of information. The API can often used to alter information on the server side as well. The returned data is usually provided in specific formats like XML or JSON. In many cases this can be changed upon request.
Your own browser is using the same mechanisms to get data from a server and present it to the user.
APIs are used by making HTTP requests. These are specific calls to a service using a supported method with address and parameters. The address is structured as a URL which consists of a protocol, hostname, port (often not necessary) and a path which possibly identifies the piece of information requested. The path is often called the API endpoint. Multiple endpoints can exists for a given service.
Parameters are additional options given to the request which can further filter the return information or specify for example the return format.
All requests always return a response, provided that the server is available, even when the request is malformed or some other error occurred. Depending on the success, the request contains a HTTP status code and a response text with further information.
On success the current state of the requested piece of information is provided.
Each HTTP request is accompanied by a HTTP request method, such as the GET method which request data from a specified resource. Other possibilities (among others) are POST for data submits, PUT for uploading files and PATCH for alteration of existing data. Certain operations (like modification) are not specifically limited to a method, this entirely depends on the API specification and implementation. For B2SHARE the methods are specifically used for specific operations.
Along with the request method, URI and authentication, the request also sends a payload if necessary. This can be text, binary data or even entire files.
To actually make a request, use a designated tool or application, or integrate it into your own application using a programming language which supports this. Examples are GUI applications like file transfer tools or web interfaces on websites. On the command line typical examples are curl and wget.
Most programming languages support making requests over HTTP, but often require the inclusion of packages which provide this functionality.
For more complex operations (like publication), a dedicated interface or command line application is required and therefore a file transfer tool often doesn’t suffice.
There are several important concepts used in B2SHARE which are relevant to using the API.
First there are communities that curate datasets which are part of the scientific domain or a research project. Community administrators maintain the metadata schema of the community. Users have to select the community when creating new records in order to have it connected that community.
Records contain data files and associated metadata and are always connected to a specific community. Communities actively curate records that are published under their name. When publishing under the EUDAT general community, this is not the case.
Metadata are a set of common fixed metadata fields and custom metadata blocks with additional fields. They are governed by fixed and community metadata schemas. When choosing a specific community, there might be additional metadata fields to be filled in.
A state is the current condition of a record, either draft, submitted or published. They can be changed through the API. Only published records are visible in B2SHARE. Draft and submitted records are only visible to the owner and possibly the community administrator under which the record is published.
B2SHARE defines several request variables that function as identifiers for objects in B2SHARE. They are used in most HTTP request addresses as part of the path to access specific objects directly
The most important variables are the community ID which uniquely identifies a community. Record IDs identify specific records, either in draft or published state.
The file bucket ID is used to identify a set of files of a specific record
All records are published under a specific community and have metadata added according to the corresponding community metadata schema in case this is required.
Metadata schemas contain the descriptions, vocabularies and expected structure and format of every metadata field in the schema. Furthermore, they define which fields are mandatory to fill in during the creation of new records.
Community metadata schema definitions are publically available through the API and on the B2SHARE website.
There are two ways of adding metadata to a record using the API: upon creation of a draft record and by providing so-called JSON patches in a request. This can also be done to already published records, but will create a new version of that record with new PIDs and checksums.
JSON patch requests modify the current state of the metadata by either changing, adding or removing fields and values. The structure of the data provided in the patch request must strictly follow the metadata schema of the community.
You can make as many patch requests as necessary before publishing your draft record or new published version.
Files can be added during the draft phase of your new record using a PUT request. Files are uploaded into the file bucket of the draft record, not the draft record itself. Therefore use the file bucket ID found in the metadata of the draft record. All added files have their checksum calculated.
Files are uploaded one-by-one in separate requests. You can upload as many files as necessary before publishing your draft record or published version.
Draft records are published by altering the value of the publication state in the metadata.
Once your record is published, the included files can not be changed anymore and no new files can be added! Metadata can be changed after publication, but will create a new version of your published record.
After your publication request, the persistent identifiers are automatically added to the record
We will now look at some examples of using the API. For many requests you need to authenticate yourself. As APIs are often used directly in applications, you don’t want to provide usernames and passwords for every request you make. Instead an access token is generated on the website which uniquely identifies yourself during a request.
B2SHARE contains public or open access and restricted data. All open access published records and metadata are public and can be accessed by anyone, this mostly holds for the contained files as well. Therefore no access tokens are required. For privately shared records and your draft records and files an access token is required. Depending on the authorization and community settings, you may or may not be able to access these records.
As the access token unique identifies yourself and allows alteration of your published and draft records, do not share it with anyone!
To generate an access token, go to the B2SHARE website, log in and navigate to the profile page.
Create new token by entering a name, followed by a click on the new token button. Note that you will not be able to retrieve this token, so store it safely.
If you loose your token you can create a new one on this page. You can create as many as you like.
Here are some example GET requests using the B2SHARE training instance and the curl application. No access token is necessary yet, as all this information is publically available.
Examples: The retrieve all existing records and list all communities examples do not use a variable in the address To search for specific records of a community, use the community ID in the query parameter q. Community IDs can be found in the listing of all communities
Examples: To get a specific record, use the record ID of that record in the address To get all your draft records you now need to add your access token as a parameter. You also need to set parameter ‘drafts’ to value 1 To list all the files contained by a record, get the file bucket ID from the record metadata and use it in the address directly in the path
Note that all command are a single line only, here they are displayed on multiple lines for clarity.
The full publication workflow using the B2SHARE REST API is as follows:
Identify a target community to place your new record under. You need the community ID of that community and depending on the community, only members of that community can publish under their name. Get the metadata schema definition of that community using the community ID. Now you know the required fields for your publication Create a draft record and upload your files and add metadata. This can all be done in multiple separate steps and for files one-by-one. When adding files, checksums will be generated for each file. Finally publish your record. Depending on the community’s settings, your record may need approval of the community before it will be shown in B2SHARE. A persistent identifier is added to the record so that it can be uniquely identified.
For clarity, here there is a complete overview of the full publication workflow using the B2SHARE REST API in a diagram. For every step (blue boxes) the corresponding request and HTTP method has been added in which the variables need to be filled in. The only exception is the ‘community approve’ step which can’t be done through the API yet.
A record can be in three states: draft, submitted or published (red boxes). Draft and submitted records are not visible openly, but only to the user or community.
Checksums are calculated for every file added and a PID is added once the draft record is committed (green boxes).
Have a look at our website for more information regarding B2SHARE
User documentation is also available here
B2SHARE hands-on training can be found on GitHub. Currently only API access using Python is covered. In the future more modules will be added.
B2SHARE REST API - New ppt available at https://www.slideshare.net/EUDAT/eudat-b2share-api-how-to-store-and-publish-research-data-using-the-b2share-api
Store and Publish Research Data
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
B2SHARE REST API
How to store and publish research data
using the B2SHARE REST API
This work is licensed under the Creative
Commons CC-BY 4.0 licence
… a user-friendly, reliable and
trustworthy way for researchers,
scientific communities and scientists to
store and publish research data from
What is the B2SHARE REST API?
The B2SHARE REST API is a set of instructions to interact
with a B2SHARE service instance
The B2SHARE REST API:
Provides direct remote interaction with the service without
using a graphical user interface
Allows integration within application or data processing
workflows for automation of publishing tasks
Supports any programming language or operating system
that supports HTTP requests
There is no limitation on usage and number of calls, even for
To create new or modify existing content, registration is
What can I do with the B2SHARE
List all existing records and communities
Search for specific records and communities
Retrieval of community-specific information
Including community metadata schemas
Create new draft records
Upload files to draft records
Add metadata using metadata schemas
Publish draft records
Modification of the metadata of existing records
What is an API?
An Application Programming Interface (API) is a
specification of a set of definitions, protocols and tools to
interact with a service, possibly from a remote location
Provides an abstraction of the underlying service technologies
used by the service itself and external applications
HTTP requests return machine-readable structures of data
representing the current state of a piece of information, or
after altering it on request
Returned data is provided in specific formats like XML or
Any browser is using requests in order to get information from
a server and to present it to the user
A specific call to a service through an API using a HTTP
request method with address and parameters
An address is a URL with optionally additional parameters:
URL = protocol + hostname + port + path
Protocol: always http:// or https://
Hostname: base address, e.g. b2share.eudat.eu
Port: sometimes required specifically, usually 80
Path: endpoint specification, e.g. api/record/1
Parameters are additional options given to the request
Every request returns a HTTP status code and response text,
even when an error occurred
On success the current state of the requested piece of
information is provided
HTTP requests (2)
Several HTTP request methods possible, e.g.:
GET: get data of specified resource
POST: submit data to be processed for specified
PUT: upload representation of URI (e.g. file)
PATCH: modify (meta)data of specified source
For different operations, use different methods
The B2SHARE service accepts payloads along with a
HTTP request, e.g. text, binary data, files
Making a request
Requests can be made by using a specific application
directly (e.g. a browser) or by using a programming
language that supports making requests in code
GUI: any file transfer application or web interface
Command line: cURL, wGet
Almost all programming languages support making
requests over HTTP
For more complex operations (like publication), a
dedicated interface or command line application is
Making a request
Browser CLI tool
Your app or
- Status code
- Response text
- Request method
- URI & parameters
Important concepts of B2SHARE
Curate datasets which are part of the scientific domain or a
Maintain their own metadata schemas and have community
Contain data files and associated metadata
Connected to a community which maintains it
Set of common fixed metadata fields and custom metadata
blocks with additional fields
Governed by fixed and community metadata schemas
Current condition of a record, either draft, submitted or published
Can be changed through the API
B2SHARE request variables
B2SHARE defines several request variables that
function as identifiers for objects in B2SHARE
Used in most HTTP request addresses as part of the
path to access specific objects directly
Most important variables:
COMMUNITY_ID: identifier of a user community
RECORD_ID: identifier for a specific record, in either
FILE_BUCKET_ID: identifier for a set of files of a
Community metadata schemas
Every community defines its own metadata schema
Contain descriptions, vocabularies and expected
structure and format of every metadata field
Define which fields are mandatory to fill in
Definitions are publically available
Metadata is added to a record using the API:
Upon creation of a draft record
Or by providing so-called JSON patches in a request
Patches modify the current state of the metadata by either
changing, adding or removing fields and values
The structure of the data provided in the patch request must
strictly follow the
metadata schema of the
As many patches as necessary
can be applied before
publishing your draft record
Files can be added during the draft phase of your new
record using a PUT request
Files are uploaded into the file bucket of the draft
record, not the draft record itself
Use the file bucket ID found in the metadata of the
Files are uploaded one-by-one in separate requests
As many file upload requests
as necessary can be made
before publishing your draft
Publishing your draft record
Draft records are published by altering the value of the
publication state in the metadata
Once your record is published:
The included files can not be changed anymore!
No new files can be added!
Metadata can be changed after publication, but will
create a new version of your published record
Persistent identifiers are automatically added
Authentication through the API
B2SHARE does not accept username and password
combination, instead use tokens for authentication!
B2SHARE contains open and restricted data:
Public: all published records and metadata, most files
No access token required
Private: your draft records and files in private records
Only accessible using your access token as parameter
in HTTP request
Automatically generated unique string of characters
attached to your account in B2SHARE
Unlimited number of tokens can be generated
Only known by the owner, do not share with others!
Getting your access token
Log in on B2SHARE and navigate to profile page:
Create a token by entering a new name:
Click on ‘New token’
Note: the token will only be shown once, so store it safely!
Simple examples (1)
Protocol and host: https://trng-b2share.eudat.eu
Application: curl (using command-line interface)
HTTP method: GET
Retrieve all existing records:
List all communities:
Search for specific records of a community:
curl –X GET https://trng-b2share.eudat.eu/api/records
curl –X GET https://trng-b2share.eudat.eu/api/records?
curl –X GET https://trng-b2share.eudat.eu/api/communities
Simple examples (2)
Get a specific record:
List all your draft records:
Access token required!
List files of specific record:
curl –X GET https://trng-b2share.eudat.eu/api/records/RECORD_ID
curl –X GET https://trng-b2share.eudat.eu/api/files/ FILE_BUCKET_ID
curl –X GET https://trng-b2share.eudat.eu/api/records?
Full publication workflow
Publishing in B2SHARE using the API involves multiple
Identify a target community for your data
Retrieve the metadata schema definition of the
The submitted metadata will have to conform to
Create a draft record:
Upload files into the draft record (one by one)
Add metadata according to schema (possibly in
Publish the record
Full publication workflow diagram
For more info: https://eudat.eu/services/b2share
B2SHARE User Documentation:
B2SHARE Training presentations:
B2SHARE hands-on training:
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Hans van Piggelen, SURFsara