SlideShare a Scribd company logo
1 of 19
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Data standardization process
for arts and humanities
Vyacheslav Tykhonov
Senior Information Scientist
(DANS-KNAW, Netherlands)
Developing the SSHOC Reference Ontology workshop
ICS-FORTH , Heraklion, Crete
21-22 May, 2019
DANS-KNAW core services
Outline
• Standardization process during data deposit and archiving
(metadata level created by users)
• Research data management and harmonization of deposited
datasets (file level)
• Standardization and enrichment of harvested content (metadata
level provided by different data providers)
• Tracking provenance information for data and tools, moving to FAIR
Big problem: researchers and librarians are not talking to each other
and there is no common Reference model!
Metadata schemas
• EASY TDR has own metadata schema developed for Dutch
scientific landscape but allows Dublin Core export from OAI-
PMH endpoint
• NARCIS is an aggregator that harvesting metadata from
various repositories, no standardization pipeline
• Metadata from Dataverse can be exported as:
Controlled vocabulary and thesaurus
• Linked data is one step forward (or actually backward in the right
direction) on solving some of standardization problems.
• By having shared controlled vocabularies (CV) created and
maintained by experts on various domains, the digital items can
be annotated with them and easily retrieved by other experts
from the same domain without being librarian. It’s clear
indication which vocabulary is good enough and shared by a
critical mass.
• A thesaurus is a semantic network of unique concepts, including
relationships between synonyms, broader and narrower
(parent/child) contexts, and other related concepts. Thesaurus is
hierarchy for controlled vocabularies.
SSHOC data repository
DANS-KNAW is leading the development of SSHOC DataverseEU project.
We’re developing multilingual web interface and localizing metadata fields and developed data
standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager
services.
SSHOC/CESSDA DataverseEU:
• Hungary (TARKI)
• Sweden (SND)
• Slovenia (ADP)
• Germany (GESIS)
• France (SciencesPro)
• Austria (AUSSDA)
• United Kingdom (UKDA)
• Italy (CNR, UniData)
• Belgium (SODA)
• Latvia (LSZDA)
• Poland (PSNC)
• Norway (DataverseNO)
• Netherlands (DANS-KNAW)
SKOS RDF Vocabularies (CESSDA)
We’re importing thesaurus delivered as SKOS RDF, for example:
Rest API endpoint delivers back JSON suitable for web applications.
Metadata standardization during deposit process
Standardized metadata in Dataverse
Standardized metadata in RDF
All relations exported and available in the Knowledge Graph
and ready for the further querying and exploration:
Research data management
Data standardization process plays a key role in the data
management plan of any organization but current situation in
research data management is very complex:
• too much data chaos in datasets
• no data transparency
• sometimes no standards available
• no provenance information attached to data
• homonyms, synonyms, generalizations, specializations,
spelling variations and mistakes, language versions are all
complicating the keyword-based search and retrieval of
information
Data standardization pipeline based on
chatbot
Mapping produced by AI as result
mappings:
Image-image:
predicateobjects:
- [a, 'http://xmlns.com/foaf/0.1/Image']
- [a, 'http://schema.org/ImageObject']
- [a, 'http://schema.org/CreativeWork']
- [a, 'http://xmlns.com/foaf/0.1/Document']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(image)]
- ['http://schema.org/image', $(image)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image)
Person-name:
predicateobjects:
- [a, 'http://schema.org/Person']
- [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person']
- [a, 'http://xmlns.com/foaf/0.1/Person']
- [a, 'http://purl.org/dc/terms/Agent']
- [a, 'http://purl.org/goodrelations/v1#BusinessEntity']
- [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson']
- [a, 'http://schema.org/Thing']
- [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing']
- [a, 'http://xmlns.com/foaf/0.1/Agent']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(name)]
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name)
Place-birth_cty:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)]
- ['http://dbpedia.org/ontology/era', $(era)]
- ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)]
- ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty)
Place-birth_prv:
predicateobjects:
- [a, 'http://schema.org/Place']
- [a, 'http://purl.org/goodrelations/v1#Location']
- [a, 'http://rdfs.co/juso/SpatialThing']
- [a, 'http://schema.org/Thing']
- ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)]
- ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime']
source: dataset-source
subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
NARCIS metadata (example)
No authority linking or controlled vocabularies support, but…
Tracking Provenance information
Prov-O example from PARTHENOS project
Time Machine association
• large scale project with 300+
partners
• development and support of
sustainable networked services
• trends watching and tracking of
software maturity level
• reliable governance model
• the foundation for the further
innovation!
Conclusion
• development of large-scale networked services out of research
pipelines
• every service should be mature enough, maintainable and follow
continuous integration pipeline
• tracking provenance information for every tool and dataset is the
highest priority
• creation and governance of standardization pipelines based on
services providing access to domain specific controlled vocabularies
and ontologies
• providing access to data, metadata and provenance (processes) in the
Knowledge Graph
• further integration of services maintained by different partners and
deployed in the Cloud
Questions?
Feel free to ask questions!
Vyacheslav (Slava) Tykhonov
e-mail: vyacheslav.tykhonov@dans.knaw.nl
website: http://dans.knaw.nl (DANS-KNAW)

More Related Content

What's hot

Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomyDejan Radic
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15Michele Mennielli
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15Michele Mennielli
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentalsasim78
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsAhmad Assaf
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataDLFCLIR
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Jan Polowinski
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...ariadnenetwork
 

What's hot (20)

No sql
No sqlNo sql
No sql
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 
Database
DatabaseDatabase
Database
 
Dkan
DkanDkan
Dkan
 
DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15DSpace-CRIS: An Open Source Solution for Research - @THETA15
DSpace-CRIS: An Open Source Solution for Research - @THETA15
 
DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15DSpace-CRIS_An open source solution for Research_EDU15
DSpace-CRIS_An open source solution for Research_EDU15
 
Ado Fundamentals
Ado FundamentalsAdo Fundamentals
Ado Fundamentals
 
I say NoSQL you say what
I say NoSQL you say whatI say NoSQL you say what
I say NoSQL you say what
 
HDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data PortalsHDL - Towards A Harmonized Dataset Model for Open Data Portals
HDL - Towards A Harmonized Dataset Model for Open Data Portals
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and Microdata
 
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
Data Management and Integration with d:swarm (Lightning talk, ELAG 2014)
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Web Spa
Web SpaWeb Spa
Web Spa
 
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
LODLAM Landscape
LODLAM LandscapeLODLAM Landscape
LODLAM Landscape
 

Similar to Data standardization process for social sciences and humanities

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewSemantic Web Company
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM4Science
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Stefan Dietze
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hubvty
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 

Similar to Data standardization process for social sciences and humanities (20)

Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - Overview
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
Putting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAMPutting Historical Data in Context: how to use DSpace-GLAM
Putting Historical Data in Context: how to use DSpace-GLAM
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Databases for Data Science
Databases for Data ScienceDatabases for Data Science
Databases for Data Science
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)Linked Data for Architecture, Engineering and Construction (AEC)
Linked Data for Architecture, Engineering and Construction (AEC)
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 

More from vty

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure vty
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 

More from vty (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure Decentralised identifiers for CLARIAH infrastructure
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 

Recently uploaded

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Recently uploaded (20)

Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Data standardization process for social sciences and humanities

  • 1. dans.knaw.nl DANS is een instituut van KNAW en NWO Data standardization process for arts and humanities Vyacheslav Tykhonov Senior Information Scientist (DANS-KNAW, Netherlands) Developing the SSHOC Reference Ontology workshop ICS-FORTH , Heraklion, Crete 21-22 May, 2019
  • 3. Outline • Standardization process during data deposit and archiving (metadata level created by users) • Research data management and harmonization of deposited datasets (file level) • Standardization and enrichment of harvested content (metadata level provided by different data providers) • Tracking provenance information for data and tools, moving to FAIR Big problem: researchers and librarians are not talking to each other and there is no common Reference model!
  • 4. Metadata schemas • EASY TDR has own metadata schema developed for Dutch scientific landscape but allows Dublin Core export from OAI- PMH endpoint • NARCIS is an aggregator that harvesting metadata from various repositories, no standardization pipeline • Metadata from Dataverse can be exported as:
  • 5. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.
  • 6. SSHOC data repository DANS-KNAW is leading the development of SSHOC DataverseEU project. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. SSHOC/CESSDA DataverseEU: • Hungary (TARKI) • Sweden (SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (CNR, UniData) • Belgium (SODA) • Latvia (LSZDA) • Poland (PSNC) • Norway (DataverseNO) • Netherlands (DANS-KNAW)
  • 7. SKOS RDF Vocabularies (CESSDA) We’re importing thesaurus delivered as SKOS RDF, for example: Rest API endpoint delivers back JSON suitable for web applications.
  • 10. Standardized metadata in RDF All relations exported and available in the Knowledge Graph and ready for the further querying and exploration:
  • 11. Research data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information
  • 12. Data standardization pipeline based on chatbot
  • 13. Mapping produced by AI as result mappings: Image-image: predicateobjects: - [a, 'http://xmlns.com/foaf/0.1/Image'] - [a, 'http://schema.org/ImageObject'] - [a, 'http://schema.org/CreativeWork'] - [a, 'http://xmlns.com/foaf/0.1/Document'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(image)] - ['http://schema.org/image', $(image)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Image/$(image) Person-name: predicateobjects: - [a, 'http://schema.org/Person'] - [a, 'http://www.w3.org/2000/10/swap/pim/contact#Person'] - [a, 'http://xmlns.com/foaf/0.1/Person'] - [a, 'http://purl.org/dc/terms/Agent'] - [a, 'http://purl.org/goodrelations/v1#BusinessEntity'] - [a, 'http://rhizomik.net/ontologies/copyrightonto.owl#LegalPerson'] - [a, 'http://schema.org/Thing'] - [a, 'http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing'] - [a, 'http://xmlns.com/foaf/0.1/Agent'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(name)] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Person/$(name) Place-birth_cty: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_cty)] - ['http://dbpedia.org/ontology/era', $(era)] - ['http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#isDescribedBy', $(era)] - ['http://dbpedia.org/ontology/birthDate', $(birth), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_cty) Place-birth_prv: predicateobjects: - [a, 'http://schema.org/Place'] - [a, 'http://purl.org/goodrelations/v1#Location'] - [a, 'http://rdfs.co/juso/SpatialThing'] - [a, 'http://schema.org/Thing'] - ['http://www.w3.org/2000/01/rdf-schema#label', $(birth_prv)] - ['http://dbpedia.org/ontology/deathDate', $(death), 'http://www.w3.org/2001/XMLSchema#datetime'] source: dataset-source subject: https://data.opendatasoft.com/ld/resources/roman-emperors@public/Place/$(birth_prv)
  • 14. NARCIS metadata (example) No authority linking or controlled vocabularies support, but…
  • 16. Prov-O example from PARTHENOS project
  • 17. Time Machine association • large scale project with 300+ partners • development and support of sustainable networked services • trends watching and tracking of software maturity level • reliable governance model • the foundation for the further innovation!
  • 18. Conclusion • development of large-scale networked services out of research pipelines • every service should be mature enough, maintainable and follow continuous integration pipeline • tracking provenance information for every tool and dataset is the highest priority • creation and governance of standardization pipelines based on services providing access to domain specific controlled vocabularies and ontologies • providing access to data, metadata and provenance (processes) in the Knowledge Graph • further integration of services maintained by different partners and deployed in the Cloud
  • 19. Questions? Feel free to ask questions! Vyacheslav (Slava) Tykhonov e-mail: vyacheslav.tykhonov@dans.knaw.nl website: http://dans.knaw.nl (DANS-KNAW)