Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Ontologies, controlled vocabularies and Dataverse

Download to read offline

Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Ontologies, controlled vocabularies and Dataverse

  1. 1. Ontologies, controlled vocabularies and Dataverse Slava Tykhonov Senior Information Scientist, Research & Innovation (DANS-KNAW) Dataverse community call, Harvard University, 03.12.2020
  2. 2. Overall goals for DANS-KNAW ● DANS-KNAW is running EASY Trusted Digital Repository as a service, it’s time to get data back from archive, convert and put in Dataverse ready for curation ● DANS-KNAW wants to run Data Stations with metadata created by and maintained by different research communities ● the long term goal of DANS is to make all datasets harvestable and approachable, and create an interoperability layer with external controlled vocabularies (FAIR Data Point)
  3. 3. DANS Data Stations - Future Data Services
  4. 4. The importance of standards and ontologies Generic controlled vocabularies to link metadata in the bibliographic collections are well known: ORCID, GRID, GeoNames, Getty. Medical knowledge graphs powered by: ● Biological Expression Language (BEL) ● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH) ● Wikidata (Open ontology) - Wikipedia Integration based on metadata standards: ● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI) The most of prominent ontologies already available as a Web Services with API endpoints. 4
  5. 5. FAIR Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  6. 6. Interoperability in EOSC ● Technical interoperability defined as the “ability of different information technology systems and software applications to communicate and exchange data”. It should allow “to accept data from each other and perform a given task in an appropriate and satisfactory manner without the need for extra operator intervention”. ● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data”. ● Organisational interoperability refers to the “way in which organisations align their business processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial goals. Focus on the requirements of the user community by making services available, easily identifiable, accessible and user-focused”. ● Legal interoperability covers “the broader environment of laws, policies, procedures and cooperation agreements” Source: EOSC Interoperability Framework v1.0
  7. 7. Our goals to increase Dataverse interoperability Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and CVs: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies
  8. 8. Introduction of Data Catalog Vocabulary (DCAT) Source: W3C DCAT recommendation DCAT defines three main classes: ● dcat:Catalog represents the catalog ● dcat:Dataset represents a dataset in a catalog. ● dcat:Distribution represents an accessible form of a dataset DCAT makes extensive use of terms of RDF, Dublin Core, SKOS, and other vocabs!
  9. 9. Simple Knowledge Organization System (SKOS) SKOS models a thesauri-like resources: - skos:Concepts with preferred labels and alternative labels (synonyms) attached to them (skos:prefLabel, skos:altLabel). - skos:Concept can be related with skos:broader, skos:narrower and skos:related properties. - terms and concepts could have more than one broader term and concept. SKOS allows to create a semantic layer on top of objects, a network with statements and relationships. A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can represent anything from “is-a” to “part-of”. 9
  10. 10. RDF graph using the SKOS Core Vocabulary 10Source: SKOS Core Guide
  11. 11. Global Research Identifier Database (GRID) in SKOS 11 Can we provide human with convenient web interface to create links to data points? Can we use Machine Learning algorithms to make a prediction about links and convert data in SKOS automatically?
  12. 12. Linked Data integration challenges ● datasets are very heterogeneous and multilingual ● data usually lacks sufficient data quality control ● data providers using different modeling schemas and styles ● linked data cleansing and versioning is very difficult to track and maintain properly, web resources aren’t persistent ● even modern data repositories providing only metadata records describing data without giving access to individual data items stored in files ● difficult to assign and manually keep up-to-date entity relationships in knowledge graph We need semantic relationships among metadata fields and their values! 12
  13. 13. What is semantics? Semantics (from Ancient Greek: σημαντικός sēmantikós, "significant")[a][1] is the study of meaning. The term can be used to refer to subfields of several distinct disciplines including linguistics, philosophy, and computer science. Linguistics In linguistics, semantics is the subfield that studies meaning. Semantics can address meaning at the levels of words, phrases, sentences, or larger units of discourse. One of the crucial questions which unites different approaches to linguistic semantics is that of the relationship between form and meaning.[2] Computer science In computer science, the term semantics refers to the meaning of language constructs, as opposed to their form (syntax). According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly but constrains the possible interpretations of what is declared."[14] (from Wikipedia)
  14. 14. Semantics in Dataverse metadata schema
  15. 15. Dataverse datasetfield API curl http://localhost:8080/api/admin/datasetfield/title To do list for Dataverse core: ● add TermURI for metadata fields (DC) ● show external controlled vocabularies available for the specific field ● add multilingual support with ‘lang’ parameter
  16. 16. Semantic Gateway as plugin application Source: Dataverse gateway
  17. 17. Semantic Gateway configuration
  18. 18. Dataverse deposit form with connection to ontologies Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  19. 19. One metadata field can be linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  20. 20. The flexibility of Semantic Gateway Source: Semantic Gateway API
  21. 21. Semantic Gateway lookup API Scenario: when user selects vocabulary and search for term, API will get filled values and returning back the list of concepts in the standardized format: GET /?lang=language&vocab=vocabulary&term=keyword examples: GET /?lang=en&vocab=unesco&query=fam GET /?vocab=mesh&query=sars
  22. 22. Semantic Gateway interface
  23. 23. Use case: CMDI, hierarchical metadata schema Some conclusions: ● Top-level concepts (CMDI components) can share the same concepts ● Relations between concepts define metadata schema ● Disambiguation of concepts is complicated ● Multilingual components have language indication (for example, keywords in Dutch) ● Hierarchy defined by semantics
  24. 24. Use case: CMDI data model and namespaces Default namespace added in Semantic Gateway for CMDI schema to keep all relationships between top-level concepts (metadata fields) in the knowledge graph: ns.dataverse.org/cmdi_component/cmdi_term However, a component or element in CMDI has a unique name among its siblings, so: Source: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data
  25. 25. Adding component-specific URIs in SKOS CMDI Component Registry was created for registered Components/Profiles Example path in CMDI: /CMD/Components/corpusProfile/resourceCommonInfo/metadataInfo/metadataCreator/actor Info/actorType ns.dataverse.org/cmdi1/metadataCreator skos:broader ns.dataverse.org/cmdi1/actorInfo or simply: cmdi1:metadataCreator skos:related cmdi1:corpusProfile CMDI concepts could be linked to the other SKOS concepts on the next step.
  26. 26. How can we link CMDI components in SKOS? Source: CMDI Component Registry
  27. 27. Export from Dataverse metadata back to CMDI Basic requirements: Dataverse metadata schema should have CMDI metadata that can be extended by custom components used by CLARIN centers in the different countries. Original relationships between fields and concepts should be kept, custom components should be added to SKOS schema. Users should be able to download metadata in the original CMDI format without losing quality.
  28. 28. The FAIR Signposting Profile Herbert Van de Sompel, DANS Chief Innovation Officer https://hvdsomp.info Two levels of access to Web resources: ● level one provides a concise set of links or a minimal set of links by value in the HTTP header ● level two delivers a complete comprehensive set of links by reference meaning in a standalone document (link set)
  29. 29. Dataverse meta(data) in FAIR Data Point (FDP) ● RESTful web service that enables data owners to expose their data sets using rich machine-readable metadata ● Provides standardized descriptions (RDF-based metadata) using controlled vocabularies and ontologies ● FDP spec is public Source: FDP The goal is to run FDP on Dataverse side (DCAT, CVs) and provide metadata export in RDF!
  30. 30. Questions? Slava Tykhonov, Senior Information Scientist vyacheslav.tykhonov@dans.knaw.nl

Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.

Views

Total views

555

On Slideshare

0

From embeds

0

Number of embeds

37

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×