SlideShare a Scribd company logo
Ontologies, controlled vocabularies
and Dataverse
Slava Tykhonov
Senior Information Scientist,
Research & Innovation (DANS-KNAW)
Dataverse community call, Harvard University, 03.12.2020
Overall goals for DANS-KNAW
● DANS-KNAW is running EASY Trusted Digital Repository as a service, it’s
time to get data back from archive, convert and put in Dataverse ready for
curation
● DANS-KNAW wants to run Data Stations with metadata created by and
maintained by different research communities
● the long term goal of DANS is to make all datasets harvestable and
approachable, and create an interoperability layer with external controlled
vocabularies (FAIR Data Point)
DANS Data Stations - Future Data Services
The importance of standards and ontologies
Generic controlled vocabularies to link metadata in the bibliographic collections are well
known: ORCID, GRID, GeoNames, Getty.
Medical knowledge graphs powered by:
● Biological Expression Language (BEL)
● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH)
● Wikidata (Open ontology) - Wikipedia
Integration based on metadata standards:
● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI)
The most of prominent ontologies already available as a Web Services with API endpoints.
4
FAIR Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond:
implementation in
Dataverse”
Interoperability in EOSC
● Technical interoperability defined as the “ability of different information technology systems and
software applications to communicate and exchange data”. It should allow “to accept data from each
other and perform a given task in an appropriate and satisfactory manner without the need for extra
operator intervention”.
● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous,
shared meaning. Semantic interoperability is a requirement to enable machine computable logic,
inferencing, knowledge discovery, and data”.
● Organisational interoperability refers to the “way in which organisations align their business
processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial
goals. Focus on the requirements of the user community by making services available, easily
identifiable, accessible and user-focused”.
● Legal interoperability covers “the broader environment of laws, policies, procedures and
cooperation agreements”
Source: EOSC Interoperability Framework v1.0
Our goals to increase Dataverse interoperability
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and CVs:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
Introduction of Data Catalog Vocabulary (DCAT)
Source: W3C DCAT recommendation
DCAT defines three main
classes:
● dcat:Catalog
represents the
catalog
● dcat:Dataset
represents a dataset
in a catalog.
● dcat:Distribution
represents an
accessible form of a
dataset
DCAT makes extensive
use of terms of RDF,
Dublin Core, SKOS, and
other vocabs!
Simple Knowledge Organization System (SKOS)
SKOS models a thesauri-like resources:
- skos:Concepts with preferred labels and alternative labels (synonyms) attached to them
(skos:prefLabel, skos:altLabel).
- skos:Concept can be related with skos:broader, skos:narrower and skos:related properties.
- terms and concepts could have more than one broader term and concept.
SKOS allows to create a semantic layer on top of objects, a network with statements and relationships.
A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can represent
anything from “is-a” to “part-of”.
9
RDF graph using the SKOS Core Vocabulary
10Source: SKOS Core Guide
Global Research Identifier Database (GRID) in SKOS
11
Can we provide human with
convenient web interface to
create links to data points?
Can we use Machine Learning
algorithms to make a prediction
about links and convert data in
SKOS automatically?
Linked Data integration challenges
● datasets are very heterogeneous and multilingual
● data usually lacks sufficient data quality control
● data providers using different modeling schemas and styles
● linked data cleansing and versioning is very difficult to track and maintain
properly, web resources aren’t persistent
● even modern data repositories providing only metadata records describing
data without giving access to individual data items stored in files
● difficult to assign and manually keep up-to-date entity relationships in
knowledge graph
We need semantic relationships among metadata fields and their values!
12
What is semantics?
Semantics (from Ancient Greek: σημαντικός sēmantikós, "significant")[a][1] is the study of meaning. The term can be used to
refer to subfields of several distinct disciplines including linguistics, philosophy, and computer science.
Linguistics
In linguistics, semantics is the subfield that studies meaning. Semantics can address meaning at the levels of words,
phrases, sentences, or larger units of discourse. One of the crucial questions which unites different approaches to linguistic
semantics is that of the relationship between form and meaning.[2]
Computer science
In computer science, the term semantics refers to the meaning of language constructs, as opposed to their form (syntax).
According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly
but constrains the possible interpretations of what is declared."[14]
(from Wikipedia)
Semantics in Dataverse metadata schema
Dataverse datasetfield API
curl http://localhost:8080/api/admin/datasetfield/title To do list for Dataverse core:
● add TermURI for
metadata fields (DC)
● show external
controlled vocabularies
available for the
specific field
● add multilingual
support with ‘lang’
parameter
Semantic Gateway as plugin application
Source: Dataverse gateway
Semantic Gateway configuration
Dataverse deposit form with connection to
ontologies
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
One metadata field can be linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
The flexibility of Semantic Gateway
Source: Semantic Gateway API
Semantic Gateway lookup API
Scenario: when user selects vocabulary and search for term, API will get filled
values and returning back the list of concepts in the standardized format:
GET /?lang=language&vocab=vocabulary&term=keyword
examples:
GET /?lang=en&vocab=unesco&query=fam
GET /?vocab=mesh&query=sars
Semantic Gateway interface
Use case: CMDI, hierarchical metadata schema
Some conclusions:
● Top-level concepts (CMDI
components) can share the same
concepts
● Relations between concepts define
metadata schema
● Disambiguation of concepts is
complicated
● Multilingual components have
language indication (for example,
keywords in Dutch)
● Hierarchy defined by semantics
Use case: CMDI data model and namespaces
Default namespace added in Semantic Gateway for CMDI schema to keep all relationships
between top-level concepts (metadata fields) in the knowledge graph:
ns.dataverse.org/cmdi_component/cmdi_term
However, a component or element in CMDI has a unique name among its siblings, so:
Source: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data
Adding component-specific URIs in SKOS
CMDI Component Registry was created for registered Components/Profiles
Example path in CMDI:
/CMD/Components/corpusProfile/resourceCommonInfo/metadataInfo/metadataCreator/actor
Info/actorType
ns.dataverse.org/cmdi1/metadataCreator skos:broader ns.dataverse.org/cmdi1/actorInfo
or simply: cmdi1:metadataCreator skos:related cmdi1:corpusProfile
CMDI concepts could be linked to the other SKOS concepts on the next step.
How can we link CMDI components in SKOS?
Source: CMDI Component Registry
Export from Dataverse metadata back to CMDI
Basic requirements:
Dataverse metadata schema should have CMDI metadata that can be extended
by custom components used by CLARIN centers in the different countries.
Original relationships between fields and concepts should be kept, custom
components should be added to SKOS schema.
Users should be able to download metadata in the original CMDI format without
losing quality.
The FAIR Signposting Profile
Herbert Van de Sompel,
DANS Chief Innovation Officer
https://hvdsomp.info
Two levels of access to Web resources:
● level one provides a concise set of links or a
minimal set of links by value in the HTTP
header
● level two delivers a complete comprehensive
set of links by reference meaning in a
standalone document (link set)
Dataverse meta(data) in FAIR Data Point (FDP)
● RESTful web service that enables data
owners to expose their data sets using
rich machine-readable metadata
● Provides standardized descriptions
(RDF-based metadata) using
controlled vocabularies and ontologies
● FDP spec is public
Source: FDP
The goal is to run FDP on
Dataverse side (DCAT, CVs) and
provide metadata export in RDF!
Questions?
Slava Tykhonov,
Senior Information Scientist
vyacheslav.tykhonov@dans.knaw.nl

More Related Content

What's hot

National Education Policy and role of Libraries
National Education Policy and role of LibrariesNational Education Policy and role of Libraries
National Education Policy and role of Libraries
Dr Trivedi
 
Information Literacy in Lifelong learning
Information Literacy in Lifelong learningInformation Literacy in Lifelong learning
Information Literacy in Lifelong learning
Empatic Project
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
lisld
 
Emerging Trends Library Science.ppt
Emerging Trends Library Science.pptEmerging Trends Library Science.ppt
Emerging Trends Library Science.ppt
Nagen87
 
DESIDOC
DESIDOC DESIDOC
DESIDOC
snehal dave
 
METS
METSMETS
Kwic
KwicKwic
Kwic
PU
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
Open Data Support
 
Interoperability Protocols and Standards in LIS
Interoperability Protocols and Standards in LISInteroperability Protocols and Standards in LIS
Interoperability Protocols and Standards in LIS
ADINET Ahmedabad
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
robin fay
 
Web scale discovery service
Web scale discovery serviceWeb scale discovery service
Web scale discovery service
Kankana Baishya
 
One nation One Subscription journal-access plan of India
One nation One Subscription journal-access plan of IndiaOne nation One Subscription journal-access plan of India
One nation One Subscription journal-access plan of India
Rangoli Awasthi
 
Networking Systems in Libraries
Networking Systems in LibrariesNetworking Systems in Libraries
Networking Systems in Libraries
David Nzoputa Ofili
 
About SDC
About SDCAbout SDC
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
Bharat Chaudhari
 
Digital Humanities
Digital Humanities Digital Humanities
Digital Humanities
Suman Das
 
Interoperability in Digital Libraries
Interoperability in Digital LibrariesInteroperability in Digital Libraries
Interoperability in Digital Libraries
Dept of Library and Information Science Tumkur University
 
NISCAIR by Jaya Singh
NISCAIR by Jaya SinghNISCAIR by Jaya Singh
NISCAIR by Jaya Singh
AMAN KUMAR KUSHWAHA
 
Resource description and Access
Resource description and AccessResource description and Access
Resource description and Access
UDAYA VARADARAJAN
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 

What's hot (20)

National Education Policy and role of Libraries
National Education Policy and role of LibrariesNational Education Policy and role of Libraries
National Education Policy and role of Libraries
 
Information Literacy in Lifelong learning
Information Literacy in Lifelong learningInformation Literacy in Lifelong learning
Information Literacy in Lifelong learning
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Emerging Trends Library Science.ppt
Emerging Trends Library Science.pptEmerging Trends Library Science.ppt
Emerging Trends Library Science.ppt
 
DESIDOC
DESIDOC DESIDOC
DESIDOC
 
METS
METSMETS
METS
 
Kwic
KwicKwic
Kwic
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Interoperability Protocols and Standards in LIS
Interoperability Protocols and Standards in LISInteroperability Protocols and Standards in LIS
Interoperability Protocols and Standards in LIS
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
Web scale discovery service
Web scale discovery serviceWeb scale discovery service
Web scale discovery service
 
One nation One Subscription journal-access plan of India
One nation One Subscription journal-access plan of IndiaOne nation One Subscription journal-access plan of India
One nation One Subscription journal-access plan of India
 
Networking Systems in Libraries
Networking Systems in LibrariesNetworking Systems in Libraries
Networking Systems in Libraries
 
About SDC
About SDCAbout SDC
About SDC
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Digital Humanities
Digital Humanities Digital Humanities
Digital Humanities
 
Interoperability in Digital Libraries
Interoperability in Digital LibrariesInteroperability in Digital Libraries
Interoperability in Digital Libraries
 
NISCAIR by Jaya Singh
NISCAIR by Jaya SinghNISCAIR by Jaya Singh
NISCAIR by Jaya Singh
 
Resource description and Access
Resource description and AccessResource description and Access
Resource description and Access
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 

Similar to Ontologies, controlled vocabularies and Dataverse

CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
vty
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
Vyacheslav Tykhonov
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
vty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
vty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
vty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
vty
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
Vyacheslav Tykhonov
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
vty
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
Dr. Haxel Consult
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
vty
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
Eric Stephan
 
Metadata
MetadataMetadata
Metadata
saurabh kaushik
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
Menzo Windhouwer
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
Peter Berger
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
The Open Education Consortium
 

Similar to Ontologies, controlled vocabularies and Dataverse (20)

CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Metadata
MetadataMetadata
Metadata
 
Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.Semantic Mapping in CLARIN Component Metadata.
Semantic Mapping in CLARIN Component Metadata.
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 

More from vty

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
vty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
vty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
vty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
vty
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
vty
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
vty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
vty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
vty
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
vty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
vty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
vty
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
vty
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
vty
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
vty
 

More from vty (19)

Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 

Recently uploaded

Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
Karen593256
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 

Recently uploaded (20)

Modelo de slide quimica para powerpoint
Modelo  de slide quimica para powerpointModelo  de slide quimica para powerpoint
Modelo de slide quimica para powerpoint
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Direct Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart AgricultureDirect Seeded Rice - Climate Smart Agriculture
Direct Seeded Rice - Climate Smart Agriculture
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 

Ontologies, controlled vocabularies and Dataverse

  • 1. Ontologies, controlled vocabularies and Dataverse Slava Tykhonov Senior Information Scientist, Research & Innovation (DANS-KNAW) Dataverse community call, Harvard University, 03.12.2020
  • 2. Overall goals for DANS-KNAW ● DANS-KNAW is running EASY Trusted Digital Repository as a service, it’s time to get data back from archive, convert and put in Dataverse ready for curation ● DANS-KNAW wants to run Data Stations with metadata created by and maintained by different research communities ● the long term goal of DANS is to make all datasets harvestable and approachable, and create an interoperability layer with external controlled vocabularies (FAIR Data Point)
  • 3. DANS Data Stations - Future Data Services
  • 4. The importance of standards and ontologies Generic controlled vocabularies to link metadata in the bibliographic collections are well known: ORCID, GRID, GeoNames, Getty. Medical knowledge graphs powered by: ● Biological Expression Language (BEL) ● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH) ● Wikidata (Open ontology) - Wikipedia Integration based on metadata standards: ● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI) The most of prominent ontologies already available as a Web Services with API endpoints. 4
  • 5. FAIR Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 6. Interoperability in EOSC ● Technical interoperability defined as the “ability of different information technology systems and software applications to communicate and exchange data”. It should allow “to accept data from each other and perform a given task in an appropriate and satisfactory manner without the need for extra operator intervention”. ● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data”. ● Organisational interoperability refers to the “way in which organisations align their business processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial goals. Focus on the requirements of the user community by making services available, easily identifiable, accessible and user-focused”. ● Legal interoperability covers “the broader environment of laws, policies, procedures and cooperation agreements” Source: EOSC Interoperability Framework v1.0
  • 7. Our goals to increase Dataverse interoperability Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and CVs: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies
  • 8. Introduction of Data Catalog Vocabulary (DCAT) Source: W3C DCAT recommendation DCAT defines three main classes: ● dcat:Catalog represents the catalog ● dcat:Dataset represents a dataset in a catalog. ● dcat:Distribution represents an accessible form of a dataset DCAT makes extensive use of terms of RDF, Dublin Core, SKOS, and other vocabs!
  • 9. Simple Knowledge Organization System (SKOS) SKOS models a thesauri-like resources: - skos:Concepts with preferred labels and alternative labels (synonyms) attached to them (skos:prefLabel, skos:altLabel). - skos:Concept can be related with skos:broader, skos:narrower and skos:related properties. - terms and concepts could have more than one broader term and concept. SKOS allows to create a semantic layer on top of objects, a network with statements and relationships. A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can represent anything from “is-a” to “part-of”. 9
  • 10. RDF graph using the SKOS Core Vocabulary 10Source: SKOS Core Guide
  • 11. Global Research Identifier Database (GRID) in SKOS 11 Can we provide human with convenient web interface to create links to data points? Can we use Machine Learning algorithms to make a prediction about links and convert data in SKOS automatically?
  • 12. Linked Data integration challenges ● datasets are very heterogeneous and multilingual ● data usually lacks sufficient data quality control ● data providers using different modeling schemas and styles ● linked data cleansing and versioning is very difficult to track and maintain properly, web resources aren’t persistent ● even modern data repositories providing only metadata records describing data without giving access to individual data items stored in files ● difficult to assign and manually keep up-to-date entity relationships in knowledge graph We need semantic relationships among metadata fields and their values! 12
  • 13. What is semantics? Semantics (from Ancient Greek: σημαντικός sēmantikós, "significant")[a][1] is the study of meaning. The term can be used to refer to subfields of several distinct disciplines including linguistics, philosophy, and computer science. Linguistics In linguistics, semantics is the subfield that studies meaning. Semantics can address meaning at the levels of words, phrases, sentences, or larger units of discourse. One of the crucial questions which unites different approaches to linguistic semantics is that of the relationship between form and meaning.[2] Computer science In computer science, the term semantics refers to the meaning of language constructs, as opposed to their form (syntax). According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly but constrains the possible interpretations of what is declared."[14] (from Wikipedia)
  • 14. Semantics in Dataverse metadata schema
  • 15. Dataverse datasetfield API curl http://localhost:8080/api/admin/datasetfield/title To do list for Dataverse core: ● add TermURI for metadata fields (DC) ● show external controlled vocabularies available for the specific field ● add multilingual support with ‘lang’ parameter
  • 16. Semantic Gateway as plugin application Source: Dataverse gateway
  • 18. Dataverse deposit form with connection to ontologies Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 19. One metadata field can be linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 20. The flexibility of Semantic Gateway Source: Semantic Gateway API
  • 21. Semantic Gateway lookup API Scenario: when user selects vocabulary and search for term, API will get filled values and returning back the list of concepts in the standardized format: GET /?lang=language&vocab=vocabulary&term=keyword examples: GET /?lang=en&vocab=unesco&query=fam GET /?vocab=mesh&query=sars
  • 23. Use case: CMDI, hierarchical metadata schema Some conclusions: ● Top-level concepts (CMDI components) can share the same concepts ● Relations between concepts define metadata schema ● Disambiguation of concepts is complicated ● Multilingual components have language indication (for example, keywords in Dutch) ● Hierarchy defined by semantics
  • 24. Use case: CMDI data model and namespaces Default namespace added in Semantic Gateway for CMDI schema to keep all relationships between top-level concepts (metadata fields) in the knowledge graph: ns.dataverse.org/cmdi_component/cmdi_term However, a component or element in CMDI has a unique name among its siblings, so: Source: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data
  • 25. Adding component-specific URIs in SKOS CMDI Component Registry was created for registered Components/Profiles Example path in CMDI: /CMD/Components/corpusProfile/resourceCommonInfo/metadataInfo/metadataCreator/actor Info/actorType ns.dataverse.org/cmdi1/metadataCreator skos:broader ns.dataverse.org/cmdi1/actorInfo or simply: cmdi1:metadataCreator skos:related cmdi1:corpusProfile CMDI concepts could be linked to the other SKOS concepts on the next step.
  • 26. How can we link CMDI components in SKOS? Source: CMDI Component Registry
  • 27. Export from Dataverse metadata back to CMDI Basic requirements: Dataverse metadata schema should have CMDI metadata that can be extended by custom components used by CLARIN centers in the different countries. Original relationships between fields and concepts should be kept, custom components should be added to SKOS schema. Users should be able to download metadata in the original CMDI format without losing quality.
  • 28. The FAIR Signposting Profile Herbert Van de Sompel, DANS Chief Innovation Officer https://hvdsomp.info Two levels of access to Web resources: ● level one provides a concise set of links or a minimal set of links by value in the HTTP header ● level two delivers a complete comprehensive set of links by reference meaning in a standalone document (link set)
  • 29. Dataverse meta(data) in FAIR Data Point (FDP) ● RESTful web service that enables data owners to expose their data sets using rich machine-readable metadata ● Provides standardized descriptions (RDF-based metadata) using controlled vocabularies and ontologies ● FDP spec is public Source: FDP The goal is to run FDP on Dataverse side (DCAT, CVs) and provide metadata export in RDF!
  • 30. Questions? Slava Tykhonov, Senior Information Scientist vyacheslav.tykhonov@dans.knaw.nl