In this hands on session will make use of the scenario demonstrated previously to show how the same results can be achieved programmatically. The ability to use the services via the API are essential to automate the data management process when dealing with large volumes of data, potentially from many different sources. This will require hands on coding (the demonstrations will be given using python but if users are confident they may choose their own language). By the end of this sessions, attendees should be able to understand how to use the B2 services within their own scientific workflows to allow automated data management.
1. www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
B2SHARE REST API hands-on
Hans van Piggelen, hans.vanpiggelen@surfsara.nl
Thursday July 6th, 2017
This work is licensed under the Creative
Commons CC-BY 4.0 licence
2. Today’s hands-on
Theory: (30 min)
The B2SHARE REST API
B2SHARE concepts, variables, metadata schemas
API: making requests, authentication and payloads
Publication workflow
Practice: (1 hour)
Simple examples
Hands-on exercises
Next hands-on: B2FIND
3. What is the B2SHARE REST API?
The B2SHARE REST API is a set of instructions to interact
with a B2SHARE service instance
The B2SHARE REST API:
Allows direct remote interaction with the B2SHARE service
without using a graphical user interface
Allows integration within application or data processing
workflows for automation of publishing tasks
Supports any programming language or operating system
that supports HTTP requests
There is (almost) no limitation on usage and number of calls,
even for unregistered users
To create new or modify existing content, registration is
required
4. Why use the B2SHARE REST API?
Using the B2SHARE API you can automate:
Creation of records
Uploading of files
Adding/changing metadata
Changing publications state
Retrieval of object data
Enable:
Precise replication of metadata into repository
Large file uploads
Implement publishing in your own workflow or application
Direct data ingest
Ease administration and overview of your records
5. What can I do with the B2SHARE API?
List (all) existing records and communities
Search for specific records and communities
Retrieval of community-specific information
Including community metadata schemas
Create new draft records
Upload files to draft records
Add metadata using metadata schemas
Publish draft records
Modification of the metadata of existing records
More to be added in future releases…
6. Important concepts of B2SHARE
Records:
Contain data files and associated metadata
Connected to a community which possibly maintains it
Metadata:
Set of common fixed metadata fields and custom metadata
blocks with additional fields
Governed by fixed and community metadata schemas
Communities:
Curate datasets which are part of the scientific domain or a
research project
Maintain their own metadata schemas and have community
administrators
States:
Current condition of a record, either draft, submitted or published
Can be changed through the API
7
7. B2SHARE request variables
B2SHARE defines several request variables that
function as identifiers for objects in B2SHARE
Used in most HTTP request addresses as part of the
path to access specific objects directly
Most important variables:
COMMUNITY_ID: identifier of a user community
RECORD_ID: identifier for a specific record, in either
state
FILE_BUCKET_ID: identifier for a set of files of a
specific record
8
8. Draft records and versioning
Draft records:
Can be updated with new
files and metadata
Have publication state ‘draft’
Published records:
Cannot have files updated
anymore
Metadata updates possible
but discouraged
B2SHARE supports
versioning of records:
Existing published records
can be updated through their
draft counterpart
Creates new PIDs, bucket
IDs, links
Draft
✔ New files
✔ New records
Published
✗ New files
✗ New records
Draft
9. Community metadata schemas
Every community defines its own metadata schema
using a hierarchical JSON Schema-based structure
Metadata schemas:
Contain descriptions, vocabularies and expected
structure and format of every metadata field,
including optional fields
May contain community-specific fields
Definitions are publically available, e.g. EUDAT:
10
14. Making a request
Requests can be made by using a specific application
directly or by using a programming language that
supports making requests in code
Example applications:
GUI: any file transfer application or web interface
Command line: cURL, wGet
Almost all programming languages support making
requests over HTTP
For more complex operations (like publication), a
dedicated interface or command line application is more
useful
16
15. HTTP requests
A specific call to a service through an API using a method
with address and parameters
An address is a URL:
URL = protocol + hostname + port + path
Protocol: always http:// or https://
Hostname: base address, e.g. b2share.eudat.eu
Port: sometimes required specifically, usually 80 or 443
Path: endpoint specification, e.g. /api/record/1
Optional parameters are additional options given to the
request and are added to the URL
On success the current state of the requested piece of
information is provided
The B2SHARE service accepts payloads along with a HTTP
request, e.g. text, binary data, files
17
16. HTTP request method
Different methods have different meaning, but up to
service on how to process them
Common methods:
GET: requests data from a specified resource
POST: submits data to be processed to a specified
resource
PUT: uploads a representation of the specified URI
PATCH: modify state of specified resource
DELETE: delete a specified resource
17. HTTP responses
Every request returns a status, header and message body, even
when an error occurred
Status line: status code and reason
Header: information on body content
Body: actual response text
Status codes:
1xx: Informational – Request received, continuing process
2xx: Success – Action was successfully received, understood,
and accepted
3xx: Redirection – Further action must be taken in order to
complete the request
4xx: Client Error – Request contains bad syntax or cannot be
fulfilled
5xx: Server Error – Server failed to fulfill an apparently valid
request
HTTP response is pure text, needs interpretation
18. HTTP request overview
20
Browser CLI tool
Your app or
workflow
HTTP response:
- Header
- Status code
- Response text
HTTP request:
- Request method
- Header
- URL & parameters
- Authentication
- Payloads
Server
Client
19. Authentication through the API
B2SHARE does not accept username and password
combination, instead use tokens for authentication!
B2SHARE contains open and restricted data:
Public: all published records and metadata, most files
No access token required
Private: your draft records and files in private records
Only accessible using your access token as parameter
in HTTP request
Access tokens:
Automatically generated unique string of characters
attached to your account in B2SHARE
Only known by the owner, do not share with others!
21
20. Full B2SHARE publication workflow
Publishing in B2SHARE using the API involves multiple
steps:
Identify a target community for your data
Retrieve the metadata schema definition of the
community
The submitted metadata will have to conform to
this schema
Create a draft record:
Upload files into the draft record (one by one)
Add metadata according to schema (possibly in
multiple steps)
Publish the record
22
21. Full publication workflow diagram
23
Create
draft
Add metadata
Add files
POST /api/records
GET /api/records/RECORD_ID/draft
PATCH /api/records/RECORD_ID/draft
PUT
/api/files/FILE_BUCKET_ID/FILE_NAME
Commit
Published
record
PATCH /api/records/RECORD_ID/draft
Draft
record
PID
Checksum
Needs
approval?
Submitted
record
Community
Approve
NO
YES
22. Adding metadata
Metadata is added to a record using the API:
Upon creation of a draft record in a POST request
Or by providing so-called JSON patches in a PATCH
request
Patches modify the current state of the metadata by either
changing, adding or removing fields and values
The structure of the data provided in the patch request must
strictly follow the
metadata schema of the
community
As many patches as necessary
can be applied before
publishing your draft record
24
Add
metadata
Draft
record
23. Adding files
Files can be added during the draft phase of your new
record using a PUT request
Files are uploaded into the file bucket of the draft
record, not the draft record itself
Use the file bucket ID found in the metadata of the
draft record
Files are uploaded one-by-one in separate requests
As many file upload requests
as necessary can be made
before publishing your draft
record
25
Add files
Draft
record
Checksum
24. Publishing your draft record
Draft records are published by altering the value of the
publication state in the metadata
Once your record is published:
The included files can not be changed anymore!
No new files can be added!
Metadata can be changed after publication, but will
create a new version of your published record
Persistent identifiers are automatically added on
commit
26
Commit
Draft
record
Published
record
PID
25. Simple examples
Protocol and host: https://trng-b2share.eudat.eu
Application: python (using command-line interface)
HTTP method: GET
Retrieve all existing records:
List all communities:
Search for specific records of a community:
27
r = requests.get(‘https://trng-b2share.eudat.eu/api/records’)
r = requests.get(‘https://trng-b2share.eudat.eu/api/records?
q=community:COMMUNITY_ID`)
r = requests.get(‘https://trng-b2share.eudat.eu/api/communities’)
26. Complex example
HTTP method: POST
Create draft record ‘My test upload’:
header = {‘Content-Type’: 'application/json'}
metadata = {"titles": [{"title":"My test upload"}],
"community": "e9b9792e-79fb-4b07-b6b4-b9c2bd06d095",
"open_access": True}
r = requests.post('https://trng-b2share.eudat.eu/api/records/', params=parameters,
data=json.dumps(metadata),
headers=header)
parameters = {'access_token': token}
27. JSON patch
A set of operations that alter an existing set of metadata
fields based on another set of fields
Loosely equals the difference between two sets
Operations: add, remove, replace, copy, move, test
Generated by jsonpatch package
Requires PATCH request to apply to record
Metadata
OLD
Metadata
NEW
Metadata
PATCH
29. Today’s hands-on exercises
Get and store your API token
Retrieve single record information
Check metadata and included files
Download files and compare checksum
Retrieve existing communities
Retrieve community metadata schema
Investigate metadata schema structure
Create a new draft record
Upload files and metadata
Update and complete metadata
Publish record
30. General instructions
Create an API token on the B2SHARE training website
Requirements for each request:
Request URL and HTTP method (e.g. GET, PUT)
Optional:
Object identifiers (e.g. record, community)
Additional parameters (e.g. your token)
Data payloads (e.g. files or text)
Use requests package for HTTP requests
Use jsonpatch package to create metadata update
patches
B2SHARE API endpoint: /api
31. General instructions
Log in to 145.100.59.156
ssh <user>@145.100.59.156
Use Python or iPython as interface
Helpful links:
Exercises: https://hdl.handle.net/21.T12996/ESS2017-
B2SHARE-API
Example image:
https://hdl.handle.net/21.T12996/ESS2017-Image.png
Backup token:
https://hdl.handle.net/21.T12996/token.txt
Ask questions anytime!
32. Getting your access token
Log in on B2SHARE and navigate to profile page:
Create a token by entering a new name:
Click on ‘New token’
33. Saving your token to file
Store the token in a file so it can be restored later:
Load the token in Python:
$ echo “<your token>” > token.txt
f = open(‘token.txt’, ‘r’)
token = f.read().strip()
34. Exercise 1a: single record retrieval
Endpoint: /api/records/<RECORD_ID>
Method: GET
Response status code: 200
RECORD_ID: 47077e3c4b9f4852a40709e338ad4620
Steps:
Create the URL
Retrieve the object data
Parse the response text
35. Exercise 1b: process record metadata
Use data of previous record
Steps:
Get the metadata field values
Investigate the file(s) contained
Check if open access and published
Get the file bucket ID, file key(s) and checksum(s)
36. Exercise 1c: download file and check
Endpoint: /api/files/<FILE_BUCKET_ID>/<FILE_KEY>
Method: GET
Response status code: 200
Steps:
Create the URL
Download files
Calculate checksum and compare
37. Exercise 2a: communities retrieval
Endpoint: /api/communities
Method: GET
Response status code: 200
Steps:
Get all communities
Parse the response text
Locate the EUDAT community and its ID
38. Exercise 2b: EUDAT community records
Endpoint:
/api/records?q=community:<COMMUNITY_ID>
Method: GET
Response status code: 200
COMMUNITY_ID: <>
Steps:
Set required parameters for request
Determine number of records
Show first record
39. Exercise 2c:
EUDAT community metadata schema
Endpoint:
/api/communities/<COMMUNITY_ID>/schemas/last
Method: GET
Response status code: 200
COMMUNITY_ID: <>
Steps:
Determine number of metadata fields
Determine required fields
Determine community-specific fields
Determine metadata field structure
40. Exercise 3a: create draft record
Endpoint: /api/records/
Method: POST
Response status code: 201
Steps:
Prepare header and payloads
Get API token
Get draft record ID
Check publication state
Get file bucket ID
41. Exercise 3b: add files
Endpoint: /api/files/<FILE_BUCKET_ID>/<FILE_KEY>
Method: PUT
Response status code: 200
FILE_BUCKET_ID: <draft record’s file bucket ID>
FILE_KEY: <your file name>
Steps:
Prepare header
Open file handle
Send file with request
42. Exercise 3c: add metadata
Endpoint: /api/records/<RECORD_ID>/draft
Method: PATCH
Response status code: 200
RECORD_ID: <draft record ID>
Steps:
Prepare header
Prepare JSON patch
Send patch with request
43. Exercise 3d: publish record
Endpoint: /api/records/<RECORD_ID>/draft
Method: PATCH
Response status code: 200
RECORD_ID: <draft record ID>
Steps:
Prepare header
Prepare JSON patch
Send patch with request
Check publication state of record
Check record in web browser
44. For more info: https://eudat.eu/services/b2share
B2SHARE User Documentation:
https://eudat.eu/services/userdoc/b2share
B2SHARE Training presentations:
https://www.eudat.eu/b2share-training-suite
B2SHARE hands-on training:
https://github.com/EUDAT-Training/B2SHARE-Training 46
45. www.eudat.eu
This work is licensed under the Creative Commons CC-BY 4.0 licence
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures.
Contract No. 654065
Authors Contributors
Hans van Piggelen, SURFsara
Thank you!
Editor's Notes
Who knows what B2SHARE is?
Who knows what an API is?
Who has used an API before?
Who has accessed an API using Python before?
The next three slides may be common knowledge, so we can skip them if everybody knows about these. API stands for Application Programming Interface, and is a specification of a set of definitions, protocols and tools which allow to interact with a service, possibly from a remote location. An API provides an abstraction layer of the underlying service technologies used by the service itself and external applications to communicate with the technology without exactly knowing how it works. This allows machines to easily interact with the service and correctly handle responses and possible errors.
When it comes to APIs for the Internet, communication is done through HTTP requests which return machine-readable structures of data representing the current state of a piece of information. The API can often used to alter information on the server side as well. The returned data is usually provided in specific formats like XML or JSON. In many cases this can be changed upon request.
Your own browser is using the same mechanisms to get data from a server and present it to the user.
B2SHARE has web-based GUI, but it also offers a REST API. The B2SHARE API is a set of instructions to interact with a B2SHARE service instance, for example the one of EUDAT. It provides direct interaction with the service without using a graphical user interface. Instead a command line tool or an application that integrates the communication is used. For example, the tools used in a research workflow might include automatic upload of final data in order to directly publish it. Most programming language have built-in functionality to interact with a remote service using HTTP requests.
The B2SHARE REST API can be used by any user without registration and without limitations. The only functionality that requires registration is the creation and modification of records and defined communities.
Using the B2SHARE API, the user can do several important operations such as listing all existing records and communities that are currently defined in the service. When a user is looking for specific records and/or communities the query functionality can be used and filtered using several parameters.
Once a community has been chosen, detailed community-specific information, including community metadata schema definitions and other requirements can be retrieved.
New draft records can be created, including the upload of files and the addition of metadata in accordance to the metadata schema defined by a community or project. Once a draft record is completed, it can be published. Furthermore, existing published records can be modified in terms of metadata and published as a new versioned record. The old version will always be available and citable.
There are several important concepts used in B2SHARE which are relevant to using the API.
First there are communities that curate datasets which are part of the scientific domain or a research project. Community administrators maintain the metadata schema of the community. Users have to select the community when creating new records in order to have it connected that community.
Records contain data files and associated metadata and are always connected to a specific community. Communities actively curate records that are published under their name. When publishing under the EUDAT general community, this is not the case.
Metadata are a set of common fixed metadata fields and custom metadata blocks with additional fields. They are governed by fixed and community metadata schemas. When choosing a specific community, there might be additional metadata fields to be filled in.
A state is the current condition of a record, either draft, submitted or published. They can be changed through the API. Only published records are visible in B2SHARE. Draft and submitted records are only visible to the owner and possibly the community administrator under which the record is published.
B2SHARE defines several request variables that function as identifiers for objects in B2SHARE. They are used in most HTTP request addresses as part of the path to access specific objects directly
The most important variables are the community ID which uniquely identifies a community. Record IDs identify specific records, either in draft or published state.
The file bucket ID is used to identify a set of files of a specific record
All records are published under a specific community and have metadata added according to the corresponding community metadata schema in case this is required.
Metadata schemas contain the descriptions, vocabularies and expected structure and format of every metadata field in the schema. Furthermore, they define which fields are mandatory to fill in during the creation of new records.
Community metadata schema definitions are publically available through the API and on the B2SHARE website.
To actually make a request, use a designated tool or application, or integrate it into your own application using a programming language which supports this. Examples are GUI applications like file transfer tools or web interfaces on websites. On the command line typical examples are curl and wget.
Most programming languages support making requests over HTTP, but often require the inclusion of packages which provide this functionality.
For more complex operations (like publication), a dedicated interface or command line application is required and therefore a file transfer tool often doesn’t suffice.
APIs are used by making HTTP requests. These are specific calls to a service using a supported method with address and parameters. The address is structured as a URL which consists of a protocol, hostname, port (often not necessary) and a path which possibly identifies the piece of information requested. The path is often called the API endpoint. Multiple endpoints can exists for a given service.
Parameters are additional options given to the request which can further filter the return information or specify for example the return format.
All requests always return a response, provided that the server is available, even when the request is malformed or some other error occurred. Depending on the success, the request contains a HTTP status code and a response text with further information.
On success the current state of the requested piece of information is provided.
We will now look at some examples of using the API. For many requests you need to authenticate yourself. As APIs are often used directly in applications, you don’t want to provide usernames and passwords for every request you make. Instead an access token is generated on the website which uniquely identifies yourself during a request.
B2SHARE contains public or open access and restricted data. All open access published records and metadata are public and can be accessed by anyone, this mostly holds for the contained files as well. Therefore no access tokens are required. For privately shared records and your draft records and files an access token is required. Depending on the authorization and community settings, you may or may not be able to access these records.
As the access token unique identifies yourself and allows alteration of your published and draft records, do not share it with anyone!
The full publication workflow using the B2SHARE REST API is as follows:
Identify a target community to place your new record under. You need the community ID of that community and depending on the community, only members of that community can publish under their name.
Get the metadata schema definition of that community using the community ID. Now you know the required fields for your publication
Create a draft record and upload your files and add metadata. This can all be done in multiple separate steps and for files one-by-one.
When adding files, checksums will be generated for each file.
Finally publish your record. Depending on the community’s settings, your record may need approval of the community before it will be shown in B2SHARE.
A persistent identifier is added to the record so that it can be uniquely identified.
For clarity, here there is a complete overview of the full publication workflow using the B2SHARE REST API in a diagram. For every step (blue boxes) the corresponding request and HTTP method has been added in which the variables need to be filled in. The only exception is the ‘community approve’ step which can’t be done through the API yet.
A record can be in three states: draft, submitted or published (red boxes). Draft and submitted records are not visible openly, but only to the user or community.
Checksums are calculated for every file added and a PID is added once the draft record is committed (green boxes).
There are two ways of adding metadata to a record using the API: upon creation of a draft record and by providing so-called JSON patches in a request. This can also be done to already published records, but will create a new version of that record with new PIDs and checksums.
JSON patch requests modify the current state of the metadata by either changing, adding or removing fields and values. The structure of the data provided in the patch request must strictly follow the metadata schema of the community.
You can make as many patch requests as necessary before publishing your draft record or new published version.
Files can be added during the draft phase of your new record using a PUT request. Files are uploaded into the file bucket of the draft record, not the draft record itself. Therefore use the file bucket ID found in the metadata of the draft record. All added files have their checksum calculated.
Files are uploaded one-by-one in separate requests. You can upload as many files as necessary before publishing your draft record or published version.
Draft records are published by altering the value of the publication state in the metadata.
Once your record is published, the included files can not be changed anymore and no new files can be added! Metadata can be changed after publication, but will create a new version of your published record.
After your publication request, the persistent identifiers are automatically added to the record
Here are some example GET requests using the B2SHARE training instance and the curl application. No access token is necessary yet, as all this information is publically available.
Examples:
The retrieve all existing records and list all communities examples do not use a variable in the address
To search for specific records of a community, use the community ID in the query parameter q. Community IDs can be found in the listing of all communities
To generate an access token, go to the B2SHARE website, log in and navigate to the profile page.
Create new token by entering a name, followed by a click on the new token button. Note that you will not be able to retrieve this token, so store it safely.
If you loose your token you can create a new one on this page. You can create as many as you like.
Have a look at our website for more information regarding B2SHARE
User documentation is also available here
B2SHARE hands-on training can be found on GitHub. Currently only API access using Python is covered. In the future more modules will be added.