EUDAT B2SHARE API - How to store and publish research data using the B2SHARE API
1. Store and Publish Research Data
b2share.eudat.eu
www.eudat.euEUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
B2SHARE REST API
How to store and publish research data
using the B2SHARE REST API
This work is licensed under the Creative
Commons CC-BY 4.0 licence
Version 3
November 2017
2. b2share.eudat.eu
B2SHARE is...
… a user-friendly, reliable and
trustworthy way for researchers,
scientific communities and scientists to
store and publish research data from
diverse contexts
2
3. b2share.eudat.eu
What is the B2SHARE REST API?
The B2SHARE REST API is a set of instructions to interact
with a B2SHARE service instance
The B2SHARE REST API:
Provides direct remote interaction with the service without
using a graphical user interface
Allows integration within application or data processing
workflows for automation of publishing tasks
Supports any programming language or operating system
that supports HTTP requests
There is no limitation on usage and number of calls, even for
unregistered users
To create new or modify existing content, registration is
required
4
4. b2share.eudat.eu
Why use the B2SHARE REST API?
Using the B2SHARE API you can automate:
Creation of records
Uploading of files
Adding/changing metadata
Changing the state of publications
Retrieval of object data
Removal of draft (not published) records
Enable:
Precise replication of metadata into repository
Large file uploads
Implement publishing in your own workflow or application
Direct data ingest
Ease administration and overview of your records
5. b2share.eudat.eu
What can I do with the B2SHARE
API?
List (all) existing records and communities
Search for specific records and communities
Retrieve community-specific information
Including community metadata schemas
Create new draft records
Upload files to draft records
Add metadata using metadata schemas
Publish draft records
Modify the metadata of existing records
Delete existing draft records or files therein
More to be added in future releases…
6. b2share.eudat.eu
What is an API?
An Application Programming Interface (API) is a
specification of a set of definitions, protocols and tools to
interact with a service, possibly from a remote location
It provides an abstraction of the underlying service
technologies used by the service itself and external
applications
HTTP requests return machine-readable structures of data
representing the current state of a piece of information, or
after altering it on request
Returned data is provided in specific formats like XML or
JSON
Browsers use such requests in order to get information from a
server and to present it to the user
7
7. b2share.eudat.eu
HTTP requests
A specific call to a service through an API using a HTTP
request method with address and parameters
An address is a URL with optionally additional parameters:
URL = protocol + hostname + port + path
Protocol: always http:// or https://
Hostname: base address, e.g. b2share.eudat.eu
Port: sometimes required specifically, usually 80
Path: endpoint specification, e.g. api/record/1
Parameters are additional options given to the request
Every request returns a HTTP status code and response text,
even when an error has occurred
On success the current state of the requested piece of
information is provided
8
8. b2share.eudat.eu
HTTP requests (2)
Several HTTP request methods are possible, e.g.:
GET: get data of specified resource
POST: submit data to be processed for specified
resource
PUT: upload representation of URI (e.g. file)
PATCH: modify (meta)data of specified source
DELETE: remove an existing resource
For different operations, use different methods
The B2SHARE service accepts payloads along with a
HTTP request, e.g. text, binary data, files
9
9. b2share.eudat.eu
Making a request
Requests can be made by using a specific application
directly (e.g. a browser) or by using a programming
language that supports making requests in code
Example applications:
GUI: any file transfer application or web interface
Command line: cURL, wGet
Almost all programming languages support making
requests over HTTP
For more complex operations (like publication), a
dedicated interface or command line application is
required
10
10. b2share.eudat.eu
Making a request
11
Browser
CLI tool Your app or
workflow
HTTP response:
- Header
- Status code
- Response text
HTTP request:
- Request method
- Header
- URI & parameters
- Authentication
- Payloads
Server
Client
11. b2share.eudat.eu
Important concepts of B2SHARE
Communities:
Curate datasets which are part of the scientific domain or a
research project
Maintain their own metadata schemas and have community
administrators
Records:
Contain data files and associated metadata
Are connected to a community which maintains it
Metadata:
Set of common fixed metadata fields and custom metadata
blocks with additional fields
Governed by fixed and community metadata schemas
States:
Current condition of a record, either draft, submitted or published
Can be changed through the API
12
12. b2share.eudat.eu
B2SHARE request variables
B2SHARE defines several request variables that
function as identifiers for objects in B2SHARE
Used in most HTTP request addresses as part of the
path to access specific objects directly
Most important variables:
COMMUNITY_ID: identifier of a user community
RECORD_ID: identifier for a specific record, in either
state
FILE_BUCKET_ID: identifier for a set of files of a
specific record
13
13. b2share.eudat.eu
Community metadata schemas
Every community defines its own metadata schema
Metadata schemas:
Contain descriptions, vocabularies and expected
structure and format of every metadata field
Define which fields are mandatory to fill in
Definitions are publically available
14
14. b2share.eudat.eu
Draft records and versioning
Draft records:
Can be updated with new
files and metadata
Have publication state ‘draft’
Published records:
Cannot have files added or
changed anymore
Metadata updates possible
but discouraged
B2SHARE supports
versioning of records:
Existing published records
can be updated through their
draft counterpart
Creates new PIDs, bucket
IDs, links
Draft
✔ New files
✔ New metadata
Published
✗ New files
✗ New metadata
Versioned draft
15. b2share.eudat.eu
Adding metadata
Metadata is added to a record using the API:
Upon creation of a draft record
Or by providing so-called JSON patches in a request
Patches modify the current state of the metadata by
changing, adding or removing fields and values
The structure of the data provided in the patch request
must strictly follow the
metadata schema of the
community
As many patches as necessary
can be applied before
publishing your draft record
16
Add
metadata
Draft
record
16. b2share.eudat.eu
JSON patch
A set of operations that alter an existing set of metadata
fields based on another set of fields
Loosely equals the difference between two sets
Operations: add, remove, replace, copy, move, test
Generated by the jsonpatch package
Requires PATCH request to apply to record
Metadata
OLD
Metadata
NEW
Metadata
PATCH
17. b2share.eudat.eu
Adding files
Files can be added during the draft phase of your new
record using a PUT request
Files are uploaded into the file bucket of the draft
record, not the draft record itself
Use the file bucket ID found in the metadata of the
draft record
Files are uploaded one-by-one in separate requests
As many file upload requests
as necessary can be made
before publishing your draft
record
18
Add files
Draft
record
18. b2share.eudat.eu
Publishing your draft record
Draft records are published by altering the value of the
publication state in the metadata
Once your record is published:
The included files can not be changed anymore!
No new files can be added!
Metadata can be changed after publication, but will
create a new version of your published record
Persistent identifiers are automatically added
19
Commit
Draft
record
Published
record
19. b2share.eudat.eu
Authentication through the API
B2SHARE does not accept username and password
combination, instead use tokens for authentication.
B2SHARE contains open and restricted data:
Public: all published records and metadata, most files
No access token required
Private: your draft records and files in private records
Only accessible using your access token as parameter
in HTTP request
Access tokens:
Automatically generated unique string of characters
attached to your account in B2SHARE
Only known by the owner, do not share with others!
20
20. b2share.eudat.eu
Getting your access token
Log in on B2SHARE and navigate to profile page:
Create a token by entering a new name:
Click on ‘New token’
Note: the token will only be shown once, so store it safely!
21. b2share.eudat.eu
Simple examples (1)
Protocol and host: https://trng-b2share.eudat.eu
Application: curl (using command-line interface)
HTTP method: GET
Retrieve all existing records:
List all communities:
Search for specific records of a community:
22
curl –X GET https://trng-b2share.eudat.eu/api/records
curl –X GET https://trng-b2share.eudat.eu/api/records?
q=community:<COMMUNITY_ID>
curl –X GET https://trng-b2share.eudat.eu/api/communities
22. b2share.eudat.eu
Simple examples (2)
Get a specific record:
List all your draft records:
Access token required!
List files of specific record:
FILE_BUCKET_ID required!
23
curl –X GET https://trng-b2share.eudat.eu/api/records/RECORD_ID
curl –X GET https://trng-b2share.eudat.eu/api/files/ <FILE_BUCKET_ID>
curl –X GET https://trng-b2share.eudat.eu/api/records?
drafts=1&access_token=<ACCESS_TOKEN>
23. b2share.eudat.eu
Full publication workflow
Publishing in B2SHARE using the API involves multiple
steps:
Identify a target community for your data
Retrieve the metadata schema definition of the
community
The submitted metadata will have to conform to this
schema
Create a draft record:
Upload files into the draft record (one by one)
Add metadata according to schema (possibly in
multiple steps)
Publish the record
Note the generated Persistent Identifier (PID)
24
24. b2share.eudat.eu
Full publication workflow diagram
25
Create
draft
Add metadata
Add files
POST /api/records
GET /api/records/<RECORD_ID>/draft
PATCH /api/records/<RECORD_ID>/draft
PUT
/api/records/<FILE_BUCKET_ID>/<FILE_NAME>
Commit
Published
record
PATCH
/api/records/<RECORD_ID>/draft
Draft
record
PID
Checksum
Needs
approval?
Submitted
record
Community
Approve
NO
YES
25. b2share.eudat.eu
For more info: https://eudat.eu/services/b2share
B2SHARE User Documentation:
https://eudat.eu/services/userdoc/b2share
B2SHARE Training presentations:
https://www.eudat.eu/b2share-training-suite
B2SHARE hands-on training:
https://github.com/EUDAT-Training/B2SHARE-Training
26
26. www.eudat.eu
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Authors Contributors
Hans van Piggelen, SURFsara
Thank you!
Editor's Notes
This presentation briefly introduces the B2SHARE data store and publication service of EUDAT.
B2SHARE is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store and publish research data coming from diverse contexts.
B2SHARE is part of the EUDAT CDI and is directly connected to the B2DROP and B2SAFE services for data import and export.
Other services like B2HANDLE are used to store persistent identifiers. Additional annotation like provenance data is stored using B2NOTE. B2ACCESS is required for authentication and authorization in the future.
The B2FIND service is used for metadata harvesting so that your records can be found in the more general search service of EUDAT.
B2SHARE has web-based GUI, but it also offers a REST API. The B2SHARE API is a set of instructions to interact with a B2SHARE service instance, for example the one of EUDAT. It provides direct interaction with the service without using a graphical user interface. Instead a command line tool or an application that integrates the communication is used. For example, the tools used in a research workflow might include automatic upload of final data in order to directly publish it. Most programming language have built-in functionality to interact with a remote service using HTTP requests.
The B2SHARE REST API can be used by any user without registration and without limitations. The only functionality that requires registration is the creation and modification of records and communities.
Using the B2SHARE API, the user can automate the creation or draft records, uploading of files, alteration of metadata of draft records, changing the publication state of a draft record (in order to publish it), retrieval of record data and files and the removal of existing records.
It enables precise replication of metadata into B2SHARE and large file uploads, which is more difficult to achieve using a web browser and retyping metadata field values.
With the API, you can implement the sharing and publication of datasets into your own workflows or applications.
You can make administration of existing records easier without having to use a browser and to apply changes to every record separately.
More specifically, using the B2SHARE API, the user can perform several important operations such as listing all existing records and communities that are currently defined in the service. When a user is looking for specific records and/or communities the query functionality can be used and filtered using several parameters.
Once a community has been chosen, detailed community-specific information, including community metadata schema definitions and other requirements can be retrieved.
New draft records can be created, including the upload of files and the addition of metadata in accordance to the metadata schema defined by a community or project. Once a draft record is completed, it can be published. Furthermore, existing published records can be modified in terms of metadata and published as a new versioned record. The old version will always be available and citable.
Draft records can be deleted, as well as files uploaded in these draft records. Published records can only be deleted by the site administrator.
The next three slides may be common knowledge, so we can skip them if everybody knows about these. API stands for Application Programming Interface, and is a specification of a set of definitions, protocols and tools which allow to interact with a service, possibly from a remote location. An API provides an abstraction layer of the underlying service technologies used by the service itself and external applications to communicate with the technology without exactly knowing how it works. This allows machines to easily interact with the service and correctly handle responses and possible errors.
When it comes to APIs for the Internet, communication is done through HTTP requests which return machine-readable structures of data representing the current state of a piece of information. The API can often used to alter information on the server side as well. The returned data is usually provided in specific formats like XML or JSON. In many cases this can be changed upon request.
Your own browser is using the same mechanisms to get data from a server and present it to the user.
APIs are used by making HTTP requests. These are specific calls to a service using a supported method with address and parameters. The address is structured as a URL which consists of a protocol, hostname, port (often not necessary) and a path which possibly identifies the piece of information requested. The path is often called the API endpoint. Multiple endpoints can exists for a given service.
Parameters are additional options given to the request which can further filter the return information or specify for example the return format.
All requests always return a response, provided that the server is available, even when the request is malformed or some other error occurred. Depending on the success, the request contains a HTTP status code and a response text with further information.
On success the current state of the requested piece of information is provided.
Each HTTP request is accompanied by a HTTP request method, such as the GET method which request data from a specified resource. Other possibilities (among others) are POST for data submits, PUT for uploading files and PATCH for alteration of existing data. Certain operations (like modification) are not specifically limited to a method, this entirely depends on the API specification and implementation. For B2SHARE the methods are specifically used for specific operations. Some operations require special privileges.
Along with the request method, URI and authentication, the request also sends a payload if necessary. This can be text, binary data or even entire files.
To actually make a request, use a designated tool or application, or integrate it into your own application using a programming language which supports this. Examples are GUI applications like file transfer tools or web interfaces on websites. On the command line typical examples are curl and wget.
Most programming languages support making requests over HTTP, but often require the inclusion of packages which provide this functionality.
For more complex operations (like publication), a dedicated interface or command line application is required and therefore a file transfer tool often doesn’t suffice.
Here is an overview of how a request works in practice using a client-server model:
A browser, tool or app/workflow creates a request with header, authentication and the actual request payloads (possibly with a reference to an object through a identifier) and sends it to the server (e.g. a B2SHARE instance)
The server processes the request and determines whether it is understood, valid and allowed and if it applies to the right objects
It sends a response back to the original sender together with a header, status code (indicating whether the request was successfully processed) and a response text, which may contain some information on what has changed, or the new state of the referred object in the original request
To successfully render more complex webpages, as is common nowadays, many requests can be made from a browser to a server.
There are several important concepts used in B2SHARE which are relevant to using the API.
First there are communities that curate datasets which are part of the scientific domain or a research project. Community administrators maintain the metadata schema of the community. Users have to select the community when creating new records in order to have it connected that community.
Records contain data files and associated metadata and are always connected to a specific community. Communities actively curate records that are published under their name. When publishing under the EUDAT general community, no curation takes place.
Metadata are a set of common fixed metadata fields and custom metadata blocks with additional fields. They are governed by fixed and community metadata schemas. When choosing a specific community, there might be additional metadata fields to be filled in.
A state is the current condition of a record, either draft, submitted or published. They can be changed through the API. Only published records are visible in B2SHARE. Draft and submitted records are only visible to the owner and possibly the community administrator under which the record is published.
B2SHARE defines several request variables that function as identifiers for objects in B2SHARE. They are used in most HTTP request addresses as part of the path to access specific objects directly
The most important variables are the community ID which uniquely identifies a community. Record IDs identify specific records, either in draft or published state.
The file bucket ID is used to identify a set of files of a specific record
All records are published under a specific community and have metadata added according to the corresponding community metadata schema in case this is required.
Metadata schemas contain the descriptions, vocabularies and expected structure and format of every metadata field in the schema. Furthermore, they define which fields are mandatory to fill in during the creation of new records.
Community metadata schema definitions are publically available through the API and on the B2SHARE website.
B2SHARE supports two types of records: draft records and published records.
Draft records are unpublished records that can be freely modified by the owner. This means new files and metadata can added, existing ones be changed or removed.
Published records are final and therefore cannot have any files added or changed. Metadata changes are allowed but discouraged in order to keep the authenticity of the published record.
B2SHARE supports versioning of records: published records have a draft record equivalent which can be freely changed again but is closely
There are two ways of adding metadata to a record using the API: upon creation of a draft record and by providing so-called JSON patches in a request. This can also be done to already published records, but will create a new version of that record with new PIDs and checksums.
JSON patch requests modify the current state of the metadata by either changing, adding or removing fields and values. The structure of the data provided in the patch request must strictly follow the metadata schema of the community.
You can make as many patch requests as necessary before publishing your draft record or new published version.
A JSON patch enables a user to modify the metadata of an existing draft record. It is a set of operations that alter an existing set of metadata fields based on another set of fields. It loosely equals the difference between two sets of metadata, i.e. the old and new contents of metadata of a given draft record.
The operations can be add, remove, replace, copy, move and test. All these operations are valid in B2SHARE, but might not always be applicable or useful.
In Python, the jsonpatch package can be loaded which automates the generation of patches that can be sent to the B2SHARE instance.
The PATCH HTTP method is used to instruct the instance to apply the patch provided in the request data.
Files can be added during the draft phase of your new record using a PUT request. Files are uploaded into the file bucket of the draft record, not the draft record itself. Therefore use the file bucket ID found in the metadata of the draft record. All added files have their checksum calculated.
Files are uploaded one-by-one in separate requests. You can upload as many files as necessary before publishing your draft record or published version.
Draft records are published by altering the value of the publication state in the metadata.
Once your record is published, the included files can not be changed anymore and no new files can be added! Metadata can be changed after publication, but will create a new version of your published record.
After your publication request, the persistent identifiers are automatically added to the record
We will now look at some examples of using the API. For many requests you need to authenticate yourself. As APIs are often used directly in applications, you don’t want to provide usernames and passwords for every request you make. Instead an access token is generated on the website which uniquely identifies yourself during a request.
B2SHARE contains public or open access and restricted data. All open access published records and metadata are public and can be accessed by anyone, this mostly holds for the contained files as well. Therefore no access tokens are required. For privately shared records and your draft records and files an access token is required. Depending on the authorization and community settings, you may or may not be able to access these records.
As the access token unique identifies yourself and allows alteration of your published and draft records, do not share it with anyone!
To generate an access token, go to the B2SHARE website, log in and navigate to the profile page. You can only create an access token using the B2SHARE website.
Create a new token by entering a name, followed by a click on the new token button. Note that you will not be able to see this token again afterwards, so store it safely somewhere else.
If you lose your token you can create a new one on this page. You can create as many as you like.
Here are some example GET requests using the B2SHARE training instance and the curl application. No access token is necessary yet, as all this information is publically available. Note that all command are a single line only, here they are displayed on multiple lines for clarity.
Examples:
To retrieve all existing records and list all communities examples do not use a variable in the address
To search for specific records of a community, use the community ID in the query parameter q. Community IDs can be found in the listing of all communities
Examples:
To get a specific record, use the record ID of that record in the address
To get all your draft records you now need to add your access token as a parameter. You also need to set parameter ‘drafts’ to value 1
To list all the files contained by a record, get the file bucket ID from the record metadata and use it in the address directly in the path
The full publication workflow using the B2SHARE REST API is as follows:
Identify a target community to place your new record under. You need the community ID of that community and depending on the community, only members of that community can publish under their name.
Get the metadata schema definition of that community using the community ID. Now you know the required fields for your publication
Create a draft record and upload your files and add metadata. This can all be done in multiple separate steps and for files one-by-one.
When adding files, checksums will be generated for each file.
Finally publish your record. Depending on the community’s settings, your record may need approval of the community before it is shown in B2SHARE.
A persistent identifier is added to the record so that it can be uniquely identified.
For clarity, here there is a complete overview of the full publication workflow using the B2SHARE REST API in a diagram. For every step (blue boxes) the corresponding request and HTTP method has been added in which the variables need to be filled in. The only exception is the ‘community approve’ step which cannot be done through the API yet.
A record can be in three states: draft, submitted or published (red boxes). Draft and submitted records are not visible openly, but only to the user or community.
Checksums are calculated for every file added and a PID is added once the draft record is committed (green boxes).
Have a look at our website for more information regarding B2SHARE
User documentation is also available here
B2SHARE hands-on training can be found on GitHub. Currently only API access using Python is covered. In the future more modules will be added.