grlc: Bridging the Gap Between RESTful APIs and Linked Data

‹#› Het begint met een idee
BRIDGING THE GAP BETWEEN
RESTFUL APIS AND LINKED DATA
Albert Meroño-Peñuela
Rinke Hoekstra
& many others
CLARIAH Tech Day
07-10-2016

Vrije Universiteit Amsterdam
2
ACCESSING LINKED DATA

 Multiple Linked Data consuming applications
 Variety of access interfaces needed
3
ACCESSING LINKED DATA

‹#› Het begint met een idee4

5 Het begint met een idee
 One .rq file for SPARQL query
 Good support of query curation
processes
> Versioning
> Branching
> Clone-pull-push
 Web-friendly features!
> One URI per query
> Uniquely identifiable
> De-referenceable
(raw.githubusercontent.com)
5 Faculty / department / title presentation
GITHUB AS A HUB OF
SPARQL QUERIES

Rinke: this is an asset in itself.
We need to be able to keep
the queries we use to answer
research questions  for
reproducibility

 Linked Data APIs emerge
 RESTful entry point to Linked Data hubs for Web applications
 OpenPHACTS
 …but the Linked Data API (e.g. Swagger spec, code itself) still
needs to be coded and maintained
7
MEANWHILE IN THE SEMANTIC WEB…

 Cousin of BASIL in a SALAD 
 Same basic principle: 1 SPARQL query = 1
API operation
 Automatically builds Swagger spec and UI
from SPARQL
But:
 External query management
 Organization of SPARQL queries in the
GitHub repo matches organization of the
API
 Thin layer – nothing stored server-side
 Maps
> GitHub API
> Swagger spec
Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like
Linked Data APIs’. SALAD, ESWC (2016)
8 Faculty / department / title presentation

9
MAPPING GITHUB AND SWAGGER

10
SPARQL DECORATOR SYNTAX

11
THE GRLC SERVICE
 Assuming your repo is at https://github.com/:owner/:repo
and your grlc instance at :host,
> http://:host/api/:owner/:repo/spec returns the JSON swagger
spec
> http://:host/api/:owner/:repo/api-docs returns the swagger UI
> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n
calls operation with specifiec parameter values
> Uses BASIL’s SPARQL variable name convention for query parameters
 Sends requests to
> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators
> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it

12
DROPDOWNS
• Fills in the
swag[paths][op][method][parameters]
[enum] array
• Uses the de-contextualized triple
pattern of the SPARQL query’s BGP
against the same SPARQL endpoint
• Very inefficient
• JSON spec caching via reverse proxy
• LOD cache
• Own dimension/codelist cache
• Unmapped parameter ambiguity if
the user wants to mix enum with
arbitrary parameter values (“all
values”)

13
CONTENT NEGOTIATION
• API endpoints can now
end with .content_type
(e.g grlc.io/CLARIAH/wp-
queries/MyQuery.csv)
• Supports .csv, .json,
.html (can be extended)
• grlc sets ‘Accept’ HTTP
header and agnostically
returns same ‘Content-
Type’ as the SPARQL
endpoint
• Up to the SPARQL
endpoint to accept it

14
PAGINATION
• Large query results are
typically nasty to consuming
applications
• Split the result in multiple
parts (or “pages”)
• Size? #+ pagination: 100
• Navigating pages
• rel=next,prev,first,last links
in the HTTP headers (GitHub
API Traversal convention)
• Extra request parameter
?page (defaults to 1)
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-project/Queries/houseType_all
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18447
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=3
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18142
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=4>; rel=next,
project/Queries/houseType_all?page=2>; rel=prev,
project/Queries/houseType_all?page=1>; rel=first,
project/Queries/houseType_all?page=889>; rel=last

15
CACHE
• Moved implementation
outside of grlc (not its
direct responsibility)
• grlc sets HTTP header
Cache-Control to public,
max-age=900 (15 minutes,
customizable)
• nginx caches all grlc
generated JSON (and
other static/dynamic
assets)
• nginx becomes part of the
bundle

16
CONTAINER RELEASE
• Uses docker
• Infrastructure-
independent install
• Bundles (composes) all
required packages
(python, python libs, grlc,
nginx). Can be easily extended
to more
• Publicly available at
hub.docker.com
• One-command server deploy:
docker pull
clariah/grlc

The spectrum of Linked Data clients: SPARQL intensive applications
vs RESTful API applications
grlc uses decoupling of SPARQL from all client applications
(including LDA) as a powerful practice
 Separates query curation workflows from everything else
 Allows at the same time
> Web-friendly SPARQL queries
> Web-friendly RESTful APIs
 Helps you to easily organise your LDA – just organise your SPARQL
repository and you’re set
 Try it out!
> http://grlc.io/
> https://github.com/CLARIAH/grlc
17
CONCLUSIONS

Finish with the curl –X GET that gives the result of the
original query in the crappy script

THANK YOU!
@ALBERTMERONYO
DATALEGEND.NET
CLARIAH.NL
19

grlc: Bridging the Gap Between RESTful APIs and Linked Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to grlc: Bridging the Gap Between RESTful APIs and Linked Data

Similar to grlc: Bridging the Gap Between RESTful APIs and Linked Data (20)

More from Albert Meroño-Peñuela

More from Albert Meroño-Peñuela (16)

Recently uploaded

Recently uploaded (20)

grlc: Bridging the Gap Between RESTful APIs and Linked Data

Editor's Notes