Publishing RDF SKOS with microservices

Publishing RDF SKOS
with Java microservices
Fedict – Brussel – jan 2017

| p. 2
 Linked Data
 Resource Description Framework
 Triple stores
 Jena and RDF4j
 Dropwizard
Agenda

| p. 3
Dropwizard RDF4j
Overview
Jetty
Lucene RDF storeJersey
Freemarker
Slf4j

| p. 5
 Making the web machine-readable
 Distributed / web
 Challenging for queries
 Data not guaranteed to be available / persistent
 Add meaning to relations / links
Semantic web

| p. 6
 Using URI as identifier
 Dereferenceable URI
Identifier

Resource Description
Framework

| p. 8
 <Subject S> <Predicate relation P> <Object O>
 “Triple”
 S and P are resource identifiers (IRI)
 http://example.com, mailto:john@example.com, …
 urn:example:1234-56789, ...
 O can be:
 Identifier (“link” to something else)
 Literal
– String value with optional language tag
– OR typed value (e.g XSD date, integer...)
RDF Basics

| p. 9
 RDF is not a file format
 Although .rdf extension is often used for RDF/XML
 Popular serializations
 N-Triples (.nt): fast and easy
 Turtle (.ttl): human-friendly
 RDF/XML (.rdf): XML-flows
 JSON-LD (.json): web devs
RDF serializations

| p. 10
 Based upon RDF Schema
 Somewhat similar to XML Schema
 “Classes” and “properties”
 Can (and should be !) be mixed, reused
 Popular vocabularies
 Dublin Core: generic title, description…
 SKOS: broader / narrower term …
 ROV: registered organizations …
 http://lov.okfn.org/dataset/lov/
Vocabularies

| p. 11
 RDF can be generated without triple store
 Less suitable for:
 Very large tabular sets (e.g. RDBMS dumps)
 Tiny sensor data
Notes

| p. 13
 Both great Java open source frameworks
 Reading/writing/converting RDF, Triple stores ...
 Apache Jena
 https://jena.apache.org/
 Better performance / more scalable ?
 Eclipse RDF4j (Sesame)
 http://rdf4j.org/
 Better architecture (“Sails”) ?
Jena vs RDF4j

| p. 14
 Embedded store / standalone server
 100 - 150 mln triples
 No out-of-the-box HA / replication
 Probably not needed for publishing smaller sets
 Running multiple “shared nothing” ?
 Bonus: “Sail” abstraction
 Switch to GraphDB, Blazegraph with minor changes
Why (not) RDF4j as data store

| p. 16
 TS optimized for storing triples
 TS often lack fine-grained checks
 Few checks for data types, “non-null”
 Commercial stores like StarDog offer more options
 Work in progress: https://www.w3.org/TR/shacl/
 Full text search often handled by Lucene
 Often product-specific extension
 Queries and updates with SPARQL (SQL-alike)
 And / or custom api, faster but less portable
Triple store vs RDBMS

| p. 17
 Small / medium sets
 Apache Jena store (part of framework)
 Eclipse RDF4j store (part of framework)
 Larger sets
 Blazegraph (GPU acceleration in comm.version)
 OntoText GraphDB (free demo)
 Oracle Spatial and Graph
 Virtuoso (hybrid XML / RDBMS / TS)
Popular stores

| p. 18
 SPARQL endpoints
 Advanced queries
 Heavy load on server side
 Linked Data Fragments
 Very basic queries
 Shifting workload to client
 More network traffic
 http://linkeddatafragments.org/concept/
Distributed queries

| p. 20
 Mixing REST / SOA / Unix philosophy
 Do 1 thing and do it well
 Back-end
 Also in Java
 Traditional Java EE to complex for small apps
 Pippo, RH Wildfly Swarm, Jooby, Ninja, …
 Using Annotations, default config
Microservices

| p. 21
 HTTP methods
 GET, PUT, POST, DELETE, PATCH, HEAD, ...
 Content Negotiation
 HTTP request header
 Automatically serve different formats using same URL
REST

| p. 22
 Initially developed by Yammer
 http://www.dropwizard.io
 Modular but “opinionated”
 Jetty server, Jersey JAX-RS, Jackson JSON, Metrics
 Very good for REST
 Less suitable for front-end apps
 Easy deployment
 1 “uberjar” (no need for Docker ?)
Dropwizard

| p. 23
Notes
 Small hack for file type / language negotiation
 For “human-friendly” HTML view
 Use Jetty UriConnegFilter
 Not intended for multiple vhosts, heavy
caching
 Proxy / web server in front
 Authentication
 Maybe Pac4j (3rd
party): http://www.pac4j.org/

Thanks !
Bart Hanssens / Fedict
WTC III, Simon Bolivarlaan 30
1000 Brussels, Belgium
@BartHanssens
bart.hanssens [at] fedict.be | www.fedict.belgium.be

Publishing RDF SKOS with microservices

More Related Content

What's hot

Similar to Publishing RDF SKOS with microservices

More from Bart Hanssens

Recently uploaded

Publishing RDF SKOS with microservices