Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Linked Open Data and DANS


Published on

Experimental LOD infrastructure at Data Archiving and Networked Services (DANS) including Dataverse, Memento protocol and Timbuctoo RDF storage.

Published in: Science
  • @hierohiero It's very simple: Memento allows to travel back in time and find gateway pointing to archived dataset with closest date to the requested date. Our vision is to use Dataverse as temporary archive keeping all versions of datasets (under the same handle) and provenance information so we'll register every version as Memento.
    Are you sure you want to  Yes  No
    Your message goes here
  • What I find interesting is at the very end (slides 20-21): memento and dataverse as memento timegate. But i"m a afraid I don't understand at at what this could mean. Would love a plain language explanation because I sense this could be useful/important
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Linked Open Data and DANS

  1. 1. DANS is een instituut van KNAW en NWO Linked Open Data and DANS Reinier de Valk Vyacheslav Tykhonov NOTaS meeting, The Hague, 15.12.2017
  2. 2. LOD | Linked (Open) Data? • Linked Data (LD) is “a method of publishing structured data so that it can be interlinked and become more useful through semantic queries” [1] • Linked Open Data (LOD) is LD that is open, i.e., freely availably to use and republish • Builds upon standard web technologies, but extends them so that they can be read by machines • Semantic web: a web of data that can be processed by machines [1]
  3. 3. LOD | Four principles of LD [2] • Use uniform resource identifiers (URIs) as names for things • Use HTTP URIs so that people can look up those names • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) • Include links to other URIs, so that they can discover more things [2] Berners-Lee, T. (2006) Linked data.
  4. 4. LOD | Building block: the triple • The basic building block of LD is the semantic triple (or simply triple) • a triple is a statement in the form subject-predicate-object • Triples are stored in triplestores (purpose-built databases) or graph databases (databases with a more generalised structure) • These databases can be queried with query languages such as SPARQL; this is done using a (SPARQL) endpoint
  5. 5. LOD | The LOD cloud (22.08.2017) [3] [3]
  6. 6. LOD at DANS | Static LOD • A LOD graph is living – it keeps evolving • We archive static snapshots of the graph • LD is in plain ASCII – no complicated formats needed • The archived static snapshot can be revived – the README file accompanying the data describes the procedure • Examples at EASY, DANS online long-term archiving system [4] • use search term “linked data” • interesting examples: LOD Laundromat; CEDAR RDF database [4]
  7. 7. LOD at DANS | Static LOD
  8. 8. DANS LOD infrastructure • LOD conversion tool harvesting public metadata from DANS systems using OAI-PMH protocol and converting to Turtle RDF format • Virtuoso with SPARQL endpoint to store and query archived triples (static) • grlc to build Web APIs using shared SPARQL queries • Timbuctoo Linked Data storage to keep different versions of metadata harvested from DANS systems (tern into schema) • GraphQL endpoint integrated in Timbuctoo to query repository and evaluate new links
  9. 9. What is Timbuctoo? • Timbuctoo is an open source Linked Data repository system developed by Huygens ING and specialized in handling interpretative and heterogeneous content. Timbuctoo is specifically designed for academic research in the arts & humanities and is ideally suited for research institutions, libraries and archives supporting scholars who follow a hermeneutic methodology. • Data upload options: • Excel upload • CSV upload • Dataperfect upload • remote repository upload with ResourceSync
  10. 10. Description of pipeline to archive • Users depositing new datasets, metadata updating in time • Snapshots are taken regularly • ResourceSync is the only option to get updated snapshot in LOD cloud without manual interaction
  11. 11. Valid resources • Filetypes that can be imported: • text/turtle (.ttl) • application/rdf+xml (.rdf) • application/n-triples (.nt) • application/ld+json (.jsonld) • text/trig (.trig) • application/n-quads (.nq) • text/n3 (.n3) • application/vnd.timbuctoo-rdf.nquads_unified_diff (.nqud)
  12. 12. EASY metadata triples hdl:10411/UQZGXY
  13. 13. DataverseNL public metadata triples hdl:10411/AD7VGI
  14. 14. What is GraphQL? • “GraphQL is a data query language developed internally by Facebook in 2012 before being publicly released in 2015. It provides an alternative to REST and ad -hoc webservice architectures.” • Wikipedia • "GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data. GraphQL isn't tied to any specific database or storage engine and is instead backed by your existing code and data.” • GraphQL endpoint provided by Timbuctoo RDF storage allows visual Linked Data exploration. •
  15. 15. Overview of EASY datasets Demo
  16. 16. EASY dataset in Timbuctoo GraphQL endpoint
  17. 17. N-Quads U.D. • RDF data set notations are like snapshots. • We enrich them… • What if we need to track changes in resulting new RDF file? • How do we know which of these predicates has had a previous value? • What if we want to add new triples? • N-Quads itself is an extension on N-Triples, Timbuctoo supports both: • --- easy.nq 2017-12-14 11:18:16.057104790 +0200 • +++ empty.nq 2017-12–14 12:08:18.772264550 +0200 • @@ -1,35652 +0,0 @@ • +<easy:15960> <dc:location> "" . • +<easy:15960> <dc:location> "" . •
  18. 18. LOD Archiving
  19. 19. Archived SPARQL endpoint
  20. 20. Memento protocol for Linked Data Fragments Credits: Ruben Verborgh
  21. 21. Dataverse as Memento Timegate