DataverseNL as structured data hub


The development of DataverseNL as Linked Data repository.

Published in: Science
  1. 1. DANS is een instituut van KNAW en NWO The development of DataverseNL data repository as Structured Data Hub Dataverse Community Meeting, 16th of June, 2017 Harvard University Vyacheslav Tykhonov, Peter Doorn & Marion Wittenberg (DANS)
  2. 2. DataverseNL facts Started at DANS as service in 2014 In the Netherlands, DataverseNL was installed at the Utrecht University in 2010, after which it developed into a shared service of 15 institutions. General statistics (June 2017): 227 dataverses 448 datasets 1,569 files 7,151 downloads
  3. 3. DataverseNL metrics for dataverses
  4. 4. DataverseNL metrics for datasets
  5. 5. Value of DataverseNL for the Dutch data landscape • DataverseNL is service implementing best practices for data management • The vision of DANS is to store data for ongoing projects in DataverseNL, once project is finished the original and produced datasets should go to the Trusted Digital Repository • Сommunity is the biggest value of DataverseNL, hundreds of people using this service to deposit their data produced in different projects • DataverseNL is collaboration platform that allows researchers from 15 Universities and organisations to work together and share results of their research to the public • DataverseNL is major integration point where datasets from different disciplines produced by research communities of the Netherlands are coming together • DataverseNL can serve as main entrance to use different tools from Virtual Research Environments (VREs) on various types of objects (data, video, audio) and share them between members of the community
  6. 6. DANS is een instituut van KNAW en NWO Within the context of DANS’ mission, it is obligatory that every (digital) object archived via DANS has a PID, so that it can be (re)located and cited. DANS uses PIDs for both (digital) objects and people. DataverseNL for ongoing research projects • every dataset has its own handle (for Dutch Universities) • revisions of dataset don’t change the handle, every new version changing only citation EASY for permanent data archiving (DOIs) • archived dataset has DOI • every version of dataset archived from DataverseNL producing new DOI • all metadata exposed in Dublin Core
  7. 7. Linking DataverseNL to the Semantic Web • Our goal is to make DataverseNL dataset metadata available as Linked Open Data (LoD) • we’re working on markup that uses, SKOS and other vocabularies to migrate the existing metadata schemas to the Structured Data Hub • the idea that every metadata field can be described as “subject/predicate/object" (or triple) and linked to the proper vocabulary (ontology) • different disciplines and projects have different controlled vocabularies so the same metadata can be linked to various ontologies
  8. 8. introduction “ is an initiative launched on 2 June 2011 by Bing, Google and Yahoo![1][2][3] (then operators of the world's largest search engines)[4] to “create and support a common set of schemas for structured data markup on web pages.” (from Wikipedia) • We’re trying to link dataverses and datasets from DataverseNL to the proper entities from controlled vocabularies • Linked data should go to Google Knowledge Graph and can be queried via their API to get triples back • To credit people that are contributing to their knowledge base search engines show pages with Linked Data in the special format
  9. 9. Example: 5 stars DataverseNL dataset in Google
  10. 10. Digital preservation in the Long Term Archive • DANS has developed Plugin to archive datasets deposited in DataverseNL temporary storage to Trusted Digital Repository (TDR) • Before putting datasets in the Long Term Archive users should create account in TDR and get proper permissions from it to archive their data • Archival Plugin is open source software and can be easily extended by support of any TDRs supporting bagit packages:
  11. 11. Datasets archiving process “Archive” button is available for local Dataverse administrators to push datasets to EASY archive for long term preservation
  12. 12. Archived version of the dataset in TDR Archived version of the dataset is available on EASY Trusted Digital Repository landing page and can be cited in research papers
  13. 13. EASY metadata export to Linked Open Data cloud • OAI-PMH endpoint to expose metadata in Dublin Core schema • Semantic pipeline to convert Dublin Core entities to RDF triples • Triples are stored to the Huygens Timbuctoo Linked Data repository (CLARIAH project) and Virtuoso (DANS research) and ready for SPARQL querying • Outcome: EASY metadata will become data input for Linked Open Data repository
  14. 14. RDF example from EASY (Dublin Core) <> <> "eDNA-project=a11387" . <> <> " 1262685017456" . <> <> "nl" . <> <> "Text" . <> <> "ISSN=0925-6369" . <> <> "Archaeology" . <> <> " dataset:31404" . <> <> "Zuid-Holland; Noordwijkerhout; Plangebied Klein Leeuwenhorst; Gooweg 45" . <> <> "RAAP archeologisch adviesbureau" . <> <> "2010-01-05" . <> <> "2009-12-29" . <> <> "2003-10" . <> <> "Plangebied Klein Leeuwenhorst Gemeente Noordwijkerhout Een inventariserend archeologisch onderzoek" . <> <> "RAAPNOTITIE 474" . <> <> "Pronk, E.C." . <> <> "urn:nbn:nl:ui:13-6o1-kdj" . <> <> " policy/legal-information/DANSlicenceagreementUK.pdf" .
  15. 15. DataverseNL as Linked Data repository Linked Data object in DataverseNL consists of: • metadata with authorship and citation information • data usage licence • handle as persistent identifier • information how to obtain key (API token) to start use API endpoint(s) • link to API endpoint delivering data • representation of API (interactive documentation, Swagger) • data provenance • controlled vocabularies to meet domain specific community standards (optional) Public demonstration is available on Dataverse demo website.
  16. 16. Linked Data API endpoint example Source: Dataverse: Object with PID API specification in Swagger
  17. 17. DANS is een instituut van KNAW en NWO Questions?