Publishing RDF SKOS
with Java microservices
Fedict – Brussel – jan 2017
| p. 2
 Linked Data
 Resource Description Framework
 Triple stores
 Jena and RDF4j
 Dropwizard
Agenda
| p. 3
Dropwizard RDF4j
Overview
Jetty
Lucene RDF storeJersey
Freemarker
Slf4j
Linked Data
| p. 5
 Making the web machine-readable
 Distributed / web
 Challenging for queries
 Data not guaranteed to be available / persistent
 Add meaning to relations / links
Semantic web
| p. 6
 Using URI as identifier
 Dereferenceable URI
Identifier
Resource Description
Framework
| p. 8
 <Subject S> <Predicate relation P> <Object O>
 “Triple”
 S and P are resource identifiers (IRI)
 http://example.com, mailto:john@example.com, …
 urn:example:1234-56789, ...
 O can be:
 Identifier (“link” to something else)
 Literal
– String value with optional language tag
– OR typed value (e.g XSD date, integer...)
RDF Basics
| p. 9
 RDF is not a file format
 Although .rdf extension is often used for RDF/XML
 Popular serializations
 N-Triples (.nt): fast and easy
 Turtle (.ttl): human-friendly
 RDF/XML (.rdf): XML-flows
 JSON-LD (.json): web devs
RDF serializations
| p. 10
 Based upon RDF Schema
 Somewhat similar to XML Schema
 “Classes” and “properties”
 Can (and should be !) be mixed, reused
 Popular vocabularies
 Dublin Core: generic title, description…
 SKOS: broader / narrower term …
 ROV: registered organizations …
 http://lov.okfn.org/dataset/lov/
Vocabularies
| p. 11
 RDF can be generated without triple store
 Less suitable for:
 Very large tabular sets (e.g. RDBMS dumps)
 Tiny sensor data
Notes
Jena and RDF4j
| p. 13
 Both great Java open source frameworks
 Reading/writing/converting RDF, Triple stores ...
 Apache Jena
 https://jena.apache.org/
 Better performance / more scalable ?
 Eclipse RDF4j (Sesame)
 http://rdf4j.org/
 Better architecture (“Sails”) ?
Jena vs RDF4j
| p. 14
 Embedded store / standalone server
 100 - 150 mln triples
 No out-of-the-box HA / replication
 Probably not needed for publishing smaller sets
 Running multiple “shared nothing” ?
 Bonus: “Sail” abstraction
 Switch to GraphDB, Blazegraph with minor changes
Why (not) RDF4j as data store
Triple stores
| p. 16
 TS optimized for storing triples
 TS often lack fine-grained checks
 Few checks for data types, “non-null”
 Commercial stores like StarDog offer more options
 Work in progress: https://www.w3.org/TR/shacl/
 Full text search often handled by Lucene
 Often product-specific extension
 Queries and updates with SPARQL (SQL-alike)
 And / or custom api, faster but less portable
Triple store vs RDBMS
| p. 17
 Small / medium sets
 Apache Jena store (part of framework)
 Eclipse RDF4j store (part of framework)
 Larger sets
 Blazegraph (GPU acceleration in comm.version)
 OntoText GraphDB (free demo)
 Oracle Spatial and Graph
 Virtuoso (hybrid XML / RDBMS / TS)
Popular stores
| p. 18
 SPARQL endpoints
 Advanced queries
 Heavy load on server side
 Linked Data Fragments
 Very basic queries
 Shifting workload to client
 More network traffic
 http://linkeddatafragments.org/concept/
Distributed queries
Dropwizard
| p. 20
 Mixing REST / SOA / Unix philosophy
 Do 1 thing and do it well
 Back-end
 Also in Java
 Traditional Java EE to complex for small apps
 Pippo, RH Wildfly Swarm, Jooby, Ninja, …
 Using Annotations, default config
Microservices
| p. 21
 HTTP methods
 GET, PUT, POST, DELETE, PATCH, HEAD, ...
 Content Negotiation
 HTTP request header
 Automatically serve different formats using same URL
REST
| p. 22
 Initially developed by Yammer
 http://www.dropwizard.io
 Modular but “opinionated”
 Jetty server, Jersey JAX-RS, Jackson JSON, Metrics
 Very good for REST
 Less suitable for front-end apps
 Easy deployment
 1 “uberjar” (no need for Docker ?)
Dropwizard
| p. 23
Notes
 Small hack for file type / language negotiation
 For “human-friendly” HTML view
 Use Jetty UriConnegFilter
 Not intended for multiple vhosts, heavy
caching
 Proxy / web server in front
 Authentication
 Maybe Pac4j (3rd
party): http://www.pac4j.org/
Thanks !
Bart Hanssens / Fedict
WTC III, Simon Bolivarlaan 30
1000 Brussels, Belgium
@BartHanssens
bart.hanssens [at] fedict.be | www.fedict.belgium.be

Publishing RDF SKOS with microservices

  • 1.
    Publishing RDF SKOS withJava microservices Fedict – Brussel – jan 2017
  • 2.
    | p. 2 Linked Data  Resource Description Framework  Triple stores  Jena and RDF4j  Dropwizard Agenda
  • 3.
    | p. 3 DropwizardRDF4j Overview Jetty Lucene RDF storeJersey Freemarker Slf4j
  • 4.
  • 5.
    | p. 5 Making the web machine-readable  Distributed / web  Challenging for queries  Data not guaranteed to be available / persistent  Add meaning to relations / links Semantic web
  • 6.
    | p. 6 Using URI as identifier  Dereferenceable URI Identifier
  • 7.
  • 8.
    | p. 8 <Subject S> <Predicate relation P> <Object O>  “Triple”  S and P are resource identifiers (IRI)  http://example.com, mailto:john@example.com, …  urn:example:1234-56789, ...  O can be:  Identifier (“link” to something else)  Literal – String value with optional language tag – OR typed value (e.g XSD date, integer...) RDF Basics
  • 9.
    | p. 9 RDF is not a file format  Although .rdf extension is often used for RDF/XML  Popular serializations  N-Triples (.nt): fast and easy  Turtle (.ttl): human-friendly  RDF/XML (.rdf): XML-flows  JSON-LD (.json): web devs RDF serializations
  • 10.
    | p. 10 Based upon RDF Schema  Somewhat similar to XML Schema  “Classes” and “properties”  Can (and should be !) be mixed, reused  Popular vocabularies  Dublin Core: generic title, description…  SKOS: broader / narrower term …  ROV: registered organizations …  http://lov.okfn.org/dataset/lov/ Vocabularies
  • 11.
    | p. 11 RDF can be generated without triple store  Less suitable for:  Very large tabular sets (e.g. RDBMS dumps)  Tiny sensor data Notes
  • 12.
  • 13.
    | p. 13 Both great Java open source frameworks  Reading/writing/converting RDF, Triple stores ...  Apache Jena  https://jena.apache.org/  Better performance / more scalable ?  Eclipse RDF4j (Sesame)  http://rdf4j.org/  Better architecture (“Sails”) ? Jena vs RDF4j
  • 14.
    | p. 14 Embedded store / standalone server  100 - 150 mln triples  No out-of-the-box HA / replication  Probably not needed for publishing smaller sets  Running multiple “shared nothing” ?  Bonus: “Sail” abstraction  Switch to GraphDB, Blazegraph with minor changes Why (not) RDF4j as data store
  • 15.
  • 16.
    | p. 16 TS optimized for storing triples  TS often lack fine-grained checks  Few checks for data types, “non-null”  Commercial stores like StarDog offer more options  Work in progress: https://www.w3.org/TR/shacl/  Full text search often handled by Lucene  Often product-specific extension  Queries and updates with SPARQL (SQL-alike)  And / or custom api, faster but less portable Triple store vs RDBMS
  • 17.
    | p. 17 Small / medium sets  Apache Jena store (part of framework)  Eclipse RDF4j store (part of framework)  Larger sets  Blazegraph (GPU acceleration in comm.version)  OntoText GraphDB (free demo)  Oracle Spatial and Graph  Virtuoso (hybrid XML / RDBMS / TS) Popular stores
  • 18.
    | p. 18 SPARQL endpoints  Advanced queries  Heavy load on server side  Linked Data Fragments  Very basic queries  Shifting workload to client  More network traffic  http://linkeddatafragments.org/concept/ Distributed queries
  • 19.
  • 20.
    | p. 20 Mixing REST / SOA / Unix philosophy  Do 1 thing and do it well  Back-end  Also in Java  Traditional Java EE to complex for small apps  Pippo, RH Wildfly Swarm, Jooby, Ninja, …  Using Annotations, default config Microservices
  • 21.
    | p. 21 HTTP methods  GET, PUT, POST, DELETE, PATCH, HEAD, ...  Content Negotiation  HTTP request header  Automatically serve different formats using same URL REST
  • 22.
    | p. 22 Initially developed by Yammer  http://www.dropwizard.io  Modular but “opinionated”  Jetty server, Jersey JAX-RS, Jackson JSON, Metrics  Very good for REST  Less suitable for front-end apps  Easy deployment  1 “uberjar” (no need for Docker ?) Dropwizard
  • 23.
    | p. 23 Notes Small hack for file type / language negotiation  For “human-friendly” HTML view  Use Jetty UriConnegFilter  Not intended for multiple vhosts, heavy caching  Proxy / web server in front  Authentication  Maybe Pac4j (3rd party): http://www.pac4j.org/
  • 24.
    Thanks ! Bart Hanssens/ Fedict WTC III, Simon Bolivarlaan 30 1000 Brussels, Belgium @BartHanssens bart.hanssens [at] fedict.be | www.fedict.belgium.be