CV support in the next
Dataverse release
Slava Tykhonov
lead software engineer
DANS-KNAW R&D
CESSDA Tools Open Hour: Dataverse, 18.11.2021
DANS Data Stations - Future DANS Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
FAIR and Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond: implementation in
Dataverse”
Out of the box CV support in Dataverse (1)
Source: Dataverse Metadata Schema
Out of the box CV support in Dataverse (2)
Internal vocabularies are stored in Dataverse, we need more CVs!
Semantic interoperability on the infrastructure level
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field
storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in the release 5.7.
Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
Building block: Skosmos to host ontologies
7
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
Skosmos API with python module
pip install skosmos-client
SKOSMOS API for GRID ontology
9
Dataverse deposit form with connection to
ontologies
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
One metadata field can be linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
Configuration to add external controlled vocabularies
Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
Javascript interface
CV interface implemented as
Javascript and placed outside of
Dataverse application.
internal:
“js-url”: “/resources/js/cvoc-interface.js”
External:
“js-url”:
“https://raw.githubusercontent.com/Dans-
labs/semantic-
gateway/main/static/js/interface.js”
Example of the CV configuration in Dataverse
Configuration in plugable JavaScript:
● Field cvocDemo connected to “unesco”
controlled vocabulary hosted by
Skosmos
● 4 languages available (en, fr, es, ru)
● js-url pointing to javascript gateway to
read and transform output from
external API endpoint
● every Skosmos concept cached
internally in Dataverse to increase the
sustainability
We created Semantic Gateway as plugin app
Source: Dataverse gateway
Semantic Gateway for Skosmos and NDE
Suggestions for the usage of FAIR CVs
● Dutch Digital Heritage Network https://netwerkdigitaalerfgoed.nl
● Skosmos instances, for example, https://bartoc-skosmos.unibas.ch/en/
Skosmos client to access vocabularies https://pypi.org/project/skosmos-client/
● ORCID API to link CMDI records to identifiers of researchers
https://info.orcid.org
● CESSDA CV Service https://vocabularies.cessda.eu
More are coming! https://github.com/CLARIAH/awesome-humanities-
ontologies
Known issues with support of external CVs
● how CV support could be applied to any field
● support and ownership available vocabularies
● backward compatibility with fields from the old metadata schema
● clean UI experience (one selection can fill 1, 2 or 4 child fields)
● can we use non-managed vocabularies or free-text values in same field
● concept drift (the change of meaning of concepts)
● interoperability across all Dataverse instances
● how to ensure CVs are coming from authoritative services
Future plans
● Dataverse will be offered as an easy to install and maintain “archive in the
box” solution available for all data providers
● External controlled vocabularies will be available out-of-the-box and will be
included within CESSDA Metadata Schema (CMM) and CLARIN CMDI
● Dataverse administrators should be able to turn on external CV support for
any specific metadata field
● The same functionality will be implemented on the datafiles level to get
variables linked to external CVs
Future plans: linking data (files) to external CVs
Source: Scholars Portal’ Data Curation Tool (Canada)
Questions?
Slava Tykhonov (DANS-KNAW)
vyacheslav.tykhonov@dans.knaw.nl
References:
Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7
Semantic Gateway: https://github.com/Dans-labs/semantic-gateway
SSHOC task 5.2 http://github.com/SSHOC

External CV support in Dataverse 5.7

  • 1.
    CV support inthe next Dataverse release Slava Tykhonov lead software engineer DANS-KNAW R&D CESSDA Tools Open Hour: Dataverse, 18.11.2021
  • 2.
    DANS Data Stations- Future DANS Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 3.
    FAIR and Dataverse Source: MercèCrosas, “FAIR principles and beyond: implementation in Dataverse”
  • 4.
    Out of thebox CV support in Dataverse (1) Source: Dataverse Metadata Schema
  • 5.
    Out of thebox CV support in Dataverse (2) Internal vocabularies are stored in Dataverse, we need more CVs!
  • 6.
    Semantic interoperability onthe infrastructure level Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in the release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 7.
    Building block: Skosmosto host ontologies 7 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 8.
    Skosmos API withpython module pip install skosmos-client
  • 9.
    SKOSMOS API forGRID ontology 9
  • 10.
    Dataverse deposit formwith connection to ontologies Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 11.
    One metadata fieldcan be linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 12.
    Configuration to addexternal controlled vocabularies Pull Request to Dataverse core https://github.com/IQSS/dataverse/pull/7712
  • 13.
    Javascript interface CV interfaceimplemented as Javascript and placed outside of Dataverse application. internal: “js-url”: “/resources/js/cvoc-interface.js” External: “js-url”: “https://raw.githubusercontent.com/Dans- labs/semantic- gateway/main/static/js/interface.js”
  • 14.
    Example of theCV configuration in Dataverse Configuration in plugable JavaScript: ● Field cvocDemo connected to “unesco” controlled vocabulary hosted by Skosmos ● 4 languages available (en, fr, es, ru) ● js-url pointing to javascript gateway to read and transform output from external API endpoint ● every Skosmos concept cached internally in Dataverse to increase the sustainability
  • 15.
    We created SemanticGateway as plugin app Source: Dataverse gateway
  • 16.
    Semantic Gateway forSkosmos and NDE
  • 17.
    Suggestions for theusage of FAIR CVs ● Dutch Digital Heritage Network https://netwerkdigitaalerfgoed.nl ● Skosmos instances, for example, https://bartoc-skosmos.unibas.ch/en/ Skosmos client to access vocabularies https://pypi.org/project/skosmos-client/ ● ORCID API to link CMDI records to identifiers of researchers https://info.orcid.org ● CESSDA CV Service https://vocabularies.cessda.eu More are coming! https://github.com/CLARIAH/awesome-humanities- ontologies
  • 18.
    Known issues withsupport of external CVs ● how CV support could be applied to any field ● support and ownership available vocabularies ● backward compatibility with fields from the old metadata schema ● clean UI experience (one selection can fill 1, 2 or 4 child fields) ● can we use non-managed vocabularies or free-text values in same field ● concept drift (the change of meaning of concepts) ● interoperability across all Dataverse instances ● how to ensure CVs are coming from authoritative services
  • 19.
    Future plans ● Dataversewill be offered as an easy to install and maintain “archive in the box” solution available for all data providers ● External controlled vocabularies will be available out-of-the-box and will be included within CESSDA Metadata Schema (CMM) and CLARIN CMDI ● Dataverse administrators should be able to turn on external CV support for any specific metadata field ● The same functionality will be implemented on the datafiles level to get variables linked to external CVs
  • 20.
    Future plans: linkingdata (files) to external CVs Source: Scholars Portal’ Data Curation Tool (Canada)
  • 21.
    Questions? Slava Tykhonov (DANS-KNAW) vyacheslav.tykhonov@dans.knaw.nl References: Dataverse5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7 Semantic Gateway: https://github.com/Dans-labs/semantic-gateway SSHOC task 5.2 http://github.com/SSHOC