Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building an electronic repository and archives on Dataverse in the European Open Science Cloud


Published on

The presentation for XVIII International Scientific and Practical conference

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Building an electronic repository and archives on Dataverse in the European Open Science Cloud

  1. 1. DANS is een instituut van KNAW en NWO Building an electronic repository and archives on Dataverse in the European Open Science Cloud Vyacheslav Tykhonov Senior Information Scientist Data Archiving and Networked Services (DANS-KNAW, Netherlands) XVIII International Scientific and Practical conference "BUILDING OF INFORMATION SOCIETY: RESOURCES AND TECHNOLOGIES" September 19, 2019 in Kyiv
  2. 2. About me • was born in Kyiv in 1979 • studied in the National Technical University of Ukraine – Kyiv Polytechnic Institute (MSc, 2002) • used to work for international search engines companies and media monitoring agencies in the past (1999-2010) • started to work for the Royal Netherlands Academy of Arts and Sciences (KNAW) in 2011 • Senior Data Scientist at DANS-KNAW from 2016 • currently leading the technical development of DataverseEU cloud efforts in SSHOC Dataverse and other projects
  3. 3. DANS-KNAW core services
  4. 4. Why Dataverse? • Open source project developed by IQSS of Harvard University and published on github • Great product with very long history (from 2006) • Very dynamic and experienced development team working in the Agile environment (community call scheduled once in two weeks) • Clear vision and understanding of research communities requirements, public roadmap • Strong community behind of Dataverse is helping to improve the basic functionality and develop it further • Dataverse has been selected as a data repository infrastructure by countries from all continents • Well developed architecture with rich API endpoints to build application layers around Dataverse
  5. 5. Dataverse and API economy Dataverse is data repository platform with 4 API endpoints: - Native API - SWORD API - Search API - Data Access API API token is the key to connect Dataverse with unlimited amount of tools developed by different research communities and integrate it with other repositories.
  6. 6. DataverseNL as a shared service
  7. 7. Datasets container for Leiden University
  8. 8. DataverseNL as collaboration platform • DataverseNL is a shared service provided by the participating institutions and DANS. DANS performs back office tasks, including server and software maintenance and administrative support. • The participating institutions are responsible for managing the deposited data and the content. Every institution has own data manager. • User friendly:users at participating institutions simply log in and DataverseNL will be ready for use. • Reliable and safe: in cooperation with the participating institutions and universities, standard procedures have been established which ensure sound data management. Data are stored in the Netherlands. • Accessible: the service can be accessed online, from anywhere and at any time. Just open!
  9. 9. Dataset submission form
  10. 10. Published dataset in Ukrainian
  11. 11. SSHOC DataverseEU project SSHOC is Social Sciences and Humanities Open Cloud The goal of SSHOC Dataverse project (CESSDA, DARIAH and CLARIN) is to create a reliable and production ready Open Source data infrastructure that everybody can install and reuse for their own needs and requirements. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. DataverseEU countries: • Hungary (TARKI) • Sweden(SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (UniData) • Belgium (SODA) • Latvia (LSZDA) • Netherlands (DANS-KNAW)
  12. 12. SSHOC Dataverse project has two parallel tracks of the development: • Core development team is working on the modification and extension of the Dataverse core functionality. • The application development team will create new or will integrate existent tools that will be published on Dataverse App Store website. Our goal is to build the distributed and mature data infrastructure based on sustainable microservices. Development process
  13. 13. Maturity evaluation of DataverseEU services • testing process should be compliant with CESSDA services maturity model • every change of Dataverse functionality should be supplied with unit test, changes of external functionality should get Selenium scenarios. • the service should score as high as possible according to CESSDA maturity model
  14. 14. Services in European Open Science Cloud (EOSC) • EOSC requires the level 8 of maturity (at least) • we need the highest quality of software to be accepted as a service • clear and transparent evaluation of services is essential • the evidence of technical maturity is the key to success • the limited warranty will allow to stop out-of-warranty services
  15. 15. Research data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information
  16. 16. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.
  17. 17. CESSDA CV Service
  18. 18. External controlled vocabularies in Dataverse
  19. 19. Standardized metadata in Dataverse
  20. 20. Weblate as a multilingual support service
  21. 21. Managing translations with Weblate
  22. 22. Questions? Contact me: Slava Tykhonov Watch SSHOC Dataverse presentation at Harvard University: Try now! and (Ukrainian portal) (application source code) (Cloud release for Kubernetes)