This document summarizes a presentation about bridging the gap between RESTful APIs and Linked Data using GitHub and SPARQL queries. It discusses how grlc maps GitHub repositories of SPARQL queries to Swagger API specifications and endpoints to provide RESTful access to Linked Data without having to code and maintain separate APIs. Features like content negotiation, pagination, caching and containerization are described to improve the usability and performance of the generated APIs. The presentation concludes by demonstrating how grlc allows flexible organization of SPARQL queries and separation of query curation from client applications.
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
grlc: Bridging the Gap Between RESTful APIs and Linked Data
1. ‹#› Het begint met een idee
BRIDGING THE GAP BETWEEN
RESTFUL APIS AND LINKED DATA
Albert Meroño-Peñuela
Rinke Hoekstra
& many others
CLARIAH Tech Day
07-10-2016
5. ‹#› Het begint met een idee
5 Het begint met een idee
One .rq file for SPARQL query
Good support of query curation
processes
> Versioning
> Branching
> Clone-pull-push
Web-friendly features!
> One URI per query
> Uniquely identifiable
> De-referenceable
(raw.githubusercontent.com)
5 Faculty / department / title presentation
GITHUB AS A HUB OF
SPARQL QUERIES
6. ‹#› Het begint met een idee
6 Het begint met een idee
Rinke: this is an asset in itself.
We need to be able to keep
the queries we use to answer
research questions for
reproducibility
7. Vrije Universiteit Amsterdam
Linked Data APIs emerge
RESTful entry point to Linked Data hubs for Web applications
OpenPHACTS
…but the Linked Data API (e.g. Swagger spec, code itself) still
needs to be coded and maintained
7
MEANWHILE IN THE SEMANTIC WEB…
8. ‹#› Het begint met een idee
8 Het begint met een idee
Cousin of BASIL in a SALAD
Same basic principle: 1 SPARQL query = 1
API operation
Automatically builds Swagger spec and UI
from SPARQL
But:
External query management
Organization of SPARQL queries in the
GitHub repo matches organization of the
API
Thin layer – nothing stored server-side
Maps
> GitHub API
> Swagger spec
Meroño & Hoekstra. ‘grlc Makes GitHub Taste Like
Linked Data APIs’. SALAD, ESWC (2016)
8 Faculty / department / title presentation
11. Vrije Universiteit Amsterdam
11
THE GRLC SERVICE
Assuming your repo is at https://github.com/:owner/:repo
and your grlc instance at :host,
> http://:host/api/:owner/:repo/spec returns the JSON swagger
spec
> http://:host/api/:owner/:repo/api-docs returns the swagger UI
> http://:host/api/:owner/:repo/:operation?p_1=v_1...p_n=v_n
calls operation with specifiec parameter values
> Uses BASIL’s SPARQL variable name convention for query parameters
Sends requests to
> https://api.github.com/repos/:owner/:repo to look for SPARQL queries and their
decorators
> https://raw.githubusercontent.com/:owner/:repo/master/file.rq to dereference
queries, get the SPARQL, and parse it
12. Vrije Universiteit Amsterdam
12
DROPDOWNS
• Fills in the
swag[paths][op][method][parameters]
[enum] array
• Uses the de-contextualized triple
pattern of the SPARQL query’s BGP
against the same SPARQL endpoint
• Very inefficient
• JSON spec caching via reverse proxy
• LOD cache
• Own dimension/codelist cache
• Unmapped parameter ambiguity if
the user wants to mix enum with
arbitrary parameter values (“all
values”)
13. Vrije Universiteit Amsterdam
13
CONTENT NEGOTIATION
• API endpoints can now
end with .content_type
(e.g grlc.io/CLARIAH/wp-
queries/MyQuery.csv)
• Supports .csv, .json,
.html (can be extended)
• grlc sets ‘Accept’ HTTP
header and agnostically
returns same ‘Content-
Type’ as the SPARQL
endpoint
• Up to the SPARQL
endpoint to accept it
14. Vrije Universiteit Amsterdam
14
PAGINATION
• Large query results are
typically nasty to consuming
applications
• Split the result in multiple
parts (or “pages”)
• Size? #+ pagination: 100
• Navigating pages
• rel=next,prev,first,last links
in the HTTP headers (GitHub
API Traversal convention)
• Extra request parameter
?page (defaults to 1)
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-project/Queries/houseType_all
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18447
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
~ curl -X GET -H"Accept: text/csv" -I
http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=3
HTTP/1.0 200 OK
Content-Type: text/csv; charset=UTF-8
Content-Length: 18142
Server: grlc/1.0.0
Link: <http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=4>; rel=next,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=2>; rel=prev,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=1>; rel=first,
<http://localhost:8088/api/CEDAR-
project/Queries/houseType_all?page=889>; rel=last
15. Vrije Universiteit Amsterdam
15
CACHE
• Moved implementation
outside of grlc (not its
direct responsibility)
• grlc sets HTTP header
Cache-Control to public,
max-age=900 (15 minutes,
customizable)
• nginx caches all grlc
generated JSON (and
other static/dynamic
assets)
• nginx becomes part of the
bundle
16. Vrije Universiteit Amsterdam
16
CONTAINER RELEASE
• Uses docker
• Infrastructure-
independent install
• Bundles (composes) all
required packages
(python, python libs, grlc,
nginx). Can be easily extended
to more
• Publicly available at
hub.docker.com
• One-command server deploy:
docker pull
clariah/grlc
17. Vrije Universiteit Amsterdam
The spectrum of Linked Data clients: SPARQL intensive applications
vs RESTful API applications
grlc uses decoupling of SPARQL from all client applications
(including LDA) as a powerful practice
Separates query curation workflows from everything else
Allows at the same time
> Web-friendly SPARQL queries
> Web-friendly RESTful APIs
Helps you to easily organise your LDA – just organise your SPARQL
repository and you’re set
Try it out!
> http://grlc.io/
> https://github.com/CLARIAH/grlc
17
CONCLUSIONS