SlideShare a Scribd company logo
1 of 40
Building COVID-19
Museum as Open
Science project
Slava Tykhonov,
DANS-KNAW, Royal Netherlands Academy of
Arts and Sciences
Séminaire doctoral Covid-19 Museum, Université de Paris / 09.04.2021
About me: DANS-KNAW projects (2016-2021)
● CLARIAH+ (ongoing)
● EOSC Synergy (ongoing)
● SSHOC Dataverse (ongoing)
● CESSDA Dataverse Europe 2018
● Time Machine Europe Supervisor at DANS-KNAW
● PARTHENOS Horizon 2020
● CESSDA PID (PersistentIdentifiers) Horizon 2020
● CLARIAH
● RDA (Research Data Alliance) PITTS Horizon 2020
● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020
2
Source: LinkedIn
Moving towards Open Science
Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017)
Time Machine project
● An international collaboration to bring 5000 years of European history to
life
● Digitising millions of historical documents, painting and monuments
● The largest computer simulation ever developed
● An open access, interactive resource
● 600+ consortium members from the European countries
● top academic and research institutions
● Private sector partners from SMEs to international companies
“Our focus in on the joint efforts on Big Data, artificial intelligence,
augmented reality and 3D and the development of European
platforms in line with European values”.
Visit http://timemachine.eu
Data Management in the Time Machine(s)
● Primary Data (Objects) should be preserved in the Digital
Archive with persistent identifiers (Trusted Digital Repository)
● Secondary Data will be stored in the research infrastructure
with keeping data versioning and provenance information
● Linked Open Data Cloud (LOD) will provide the layer of
interoperability
● Reuse of research data should follow FAIR principles – all
research data should be Findable, Accessible, Interoperable
and Reusable
7
DANS-KNAW is one of worldwide leaders in FAIR Data (FAIRsFAIR)
Collaboration and best practice data sharing
Merce Crosas, “Harvard Data Commons”
We need a horizontal platform to surve vertical teams
Source: CoronaWhy organization 9
What is Dataverse?
● Open source data repository developed by IQSS of Harvard University
● Great product with very long history (from 2006) created by experienced and
Agile development team
● Clear vision and understanding of research communities requirements, public
roadmap
● Well developed architecture with rich APIs allows to build application layers
around Dataverse
● Strong community behind of Dataverse is helping to improve the basic
functionality and develop it further.
● DANS-KNAW is leading SSHOC WP5.2 task to deliver production ready
Dataverse repository for the European Open Science Cloud (EOSC)
communities CESSDA, CLARIN and DARIAH.
Federated Dataverse data repositories worldwide
Source: Merce Crosas, Harvard Data Commons
Benefits of the Common Data Infrastructure
● It’s distributed and sustainable, suitable for the future
● maintenance costs will drop massively, as more organizations will join,
less expensive it will be to support
● maintenance costs could be reallocated to the training and further
development of the new (common) features
● reuse of the same infrastructure components will enforce the quality and
the speed of the knowledge exchange
● building a multidisciplinary teams reusing the same infra can bring us
new insights and unexpected views
● Common Data Infrastructure plays a role of the universal gravitation
layer for Data Science projects
(and so on…)
DataverseNL collaborative data network
Source: https://www.dataverse.nl
Data Stations - Future Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
Open vs Closed Innovation
Coming back to FAIR principles
Source:
Mercè Crosas,
“FAIR principles and beyond:
implementation in Dataverse”
Interoperability in the European Open Science Cloud (EOSC)
● Technical interoperability defined as the “ability of different information technology
systems and software applications to communicate and exchange data”. It should allow
“to accept data from each other and perform a given task in an appropriate and
satisfactory manner without the need for extra operator intervention”.
● Semantic interoperability is “the ability of computer systems to transmit data with
unambiguous, shared meaning. Semantic interoperability is a requirement to enable
machine computable logic, inferencing, knowledge discovery, and data”.
● Organisational interoperability refers to the “way in which organisations align their
business processes, responsibilities and expectations to achieve commonly agreed and
mutually beneficial goals. Focus on the requirements of the user community by making
services available, easily identifiable, accessible and user-focused”.
● Legal interoperability covers “the broader environment of laws, policies, procedures and
cooperation agreements”
Source: EOSC Interoperability Framework v1.0
Our goals to increase interoperability on the global scale
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and external controlled vocabularies:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
All contributions should go to Dataverse source code and available
worldwide!
The importance of standards and ontologies
Generic controlled vocabularies to link metadata in the bibliographic collections are well
known: ORCID, GRID, GeoNames, Getty.
Medical knowledge graphs powered by:
● Biological Expression Language (BEL)
● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH)
● Wikidata (Open ontology) - Wikipedia
Integration based on metadata standards:
● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI)
The most of prominent ontologies already available as a Web Services with API
endpoints.
19
SKOSMOS framework to discover ontologies
20
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
The same metadata field linked to many ontologies
Language switch in Dataverse will change the language of suggested terms!
Use case: COVID-19 expert questions
22
Source: Epidemic Questions Answering
“In response to the COVID-19 pandemic, the Epidemic Question Answering (EPIC-QA) track challenges teams to develop
systems capable of automatically answering ad-hoc questions about the disease COVID-19, its causal virus SARS-CoV-2,
related corona viruses, and the recommended response to the pandemic. While COVID-19 has been an impetus for a
large body of emergent scientific research and inquiry, the response to COVID-19 raises questions for consumers.”
COVID-19 questions in SKOSMOS framework
23
COVID-19 questions in Dataverse metadata
24
Source: COVID-19 European data hub in Harvard Dataverse
● COVID-19 ontologies can be hosted by
SKOSMOS framework
● Researchers can enrich metadata by
adding standardized questions provided
by SKOSMOS ontologies
● rich metadata exported back to Linked
Open Data Cloud to increase a chance
to be found
● enriched metadata can be used for
further ML models training
Weblate as a localization service
● Localization/internalization is the key in Large
Scale projects
● Weblate allows to translate content in a
structured way
● several options for project visibility: accept
translations by the crowd, or only give access
to a select group of translators.
● Weblate indicates untranslated strings, strings
with failing checks, and strings that need
approval.
● when new strings are added, Weblate can
indicate which strings are new and
untranslated.
Visit http://translation.cessda.eu
Dataverse App Store
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render,
audio, JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance including SPARQL endpoint (FDP)
Federated login: eduGAIN, PIONIER ID
CLARIN Switchboard integration: Natural Language Processing tools
Visualization tools (maps, charts, timelines)
Dataverse Spreadsheet Previewer
Dataverse and CLARIN tools integration
Using Artificial Intelligence and Machine Learning
1300+ people
registered in the
organization
29
Historically most of datasets preserved in data silos (archives), not interlinked
and lacking of standardization. There are cultural, structural and
technological challenges.
Solutions:
● Integrating Linked Data and Semantic Web technologies, forcing research
communities to share data and add more interoperability following FAIR
principles
● Create a standardized (meta)data layer for Large Scale projects like Time
Machine and CoronaWhy
● Working on the automatic metadata linkage to ontologies and external
controlled vocabularies
Supporting Semantic Web solutions
Why Artificial Intelligence?
Human resources are very expensive and deficit, it’s difficult to find
appropriate expertise in-house.
Solution:
● Building AI/ML pipelines for the automatic metadata enrichment and
linkage prediction
● applying NLP for NER, data mining, topic classification etc
● building multidisciplinary knowledge graphs should facilitate the
development of new projects with economic and social scientists, they will
take ownership of their own data if they see value (Clio Infra)
Using modern NLP frameworks (spacy, for example)
Recognised entities can form a metadata layer and stored in Dataverse
How to control Artificial Intelligence
Problem:
It’s naive to fully trust Machine Learning and AI, we need to support a “human
in the loop” processes to take a control over automatic workflows. Ethics is
also important, fake detection problem.
Solution:
A lot of “human in the loop” tools already developed in research projects, we
need to support the best for the different use cases, add the appropriate
maturity, for example, with CI/CD and introduce them to research
communities.
Human in the loop
General blueprint for a human-in-the-loop interactive AI system. Credits: Stanford University HAI
“how do we build a smarter system?” to “how do we incorporate useful,
meaningful human interaction into the system?”
Semantic chatbot - ontology lookup service
35
Source: Semantic Bot
Hypothes.is annotations as a peer review service
1. AI pipeline does
domain specific
entities extraction
and ranking of
relevant CORD-19
papers.
2. Automatic entities
and statements will
be added, important
fragments should be
highlighted.
3. Human annotators
should verify results
and validate all
statements.
36
Doccano annotation with Machine Learning
Source: Doccano Labs
Metrics and integration with Apache Superset
Source: Apache Superset (Open Source)
BeFAIR Open Science Framework
Visit https://github.com/CoronaWhy/befair
Basic Infrastructure of COVID-19 Museum
Thank you! Questions?
Slava Tykhonov
DANS-KNAW
vyacheslav.tykhonov@dans.knaw.nl

More Related Content

What's hot

Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataversevty
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesVyacheslav Tykhonov
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repositoryvty
 
The Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectThe Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectEUDAT
 

What's hot (20)

Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Metaverse for Dataverse
Metaverse for DataverseMetaverse for Dataverse
Metaverse for Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemesCLARIAH CMDI use case and flexible metadata schemes
CLARIAH CMDI use case and flexible metadata schemes
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and SparqlifyLOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series: D2R and Sparqlify
 
The Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) ProjectThe Extreme Data Cloud (XDC) Project
The Extreme Data Cloud (XDC) Project
 

Similar to Building COVID-19 Museum as Open Science project overview

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs vty
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEnno Meijers
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...Xiaogang (Marshall) Ma
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar SlidesDuraSpace
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 

Similar to Building COVID-19 Museum as Open Science project overview (20)

Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs Decentralised identifiers and knowledge graphs
Decentralised identifiers and knowledge graphs
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
 
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
5.15.17 Powering Linked Data and Hosted Solutions with Fedora Webinar Slides
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 

Recently uploaded

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Recently uploaded (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Building COVID-19 Museum as Open Science project overview

  • 1. Building COVID-19 Museum as Open Science project Slava Tykhonov, DANS-KNAW, Royal Netherlands Academy of Arts and Sciences Séminaire doctoral Covid-19 Museum, Université de Paris / 09.04.2021
  • 2. About me: DANS-KNAW projects (2016-2021) ● CLARIAH+ (ongoing) ● EOSC Synergy (ongoing) ● SSHOC Dataverse (ongoing) ● CESSDA Dataverse Europe 2018 ● Time Machine Europe Supervisor at DANS-KNAW ● PARTHENOS Horizon 2020 ● CESSDA PID (PersistentIdentifiers) Horizon 2020 ● CLARIAH ● RDA (Research Data Alliance) PITTS Horizon 2020 ● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020 2 Source: LinkedIn
  • 3. Moving towards Open Science Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017)
  • 4. Time Machine project ● An international collaboration to bring 5000 years of European history to life ● Digitising millions of historical documents, painting and monuments ● The largest computer simulation ever developed ● An open access, interactive resource ● 600+ consortium members from the European countries ● top academic and research institutions ● Private sector partners from SMEs to international companies “Our focus in on the joint efforts on Big Data, artificial intelligence, augmented reality and 3D and the development of European platforms in line with European values”. Visit http://timemachine.eu
  • 5.
  • 6. Data Management in the Time Machine(s) ● Primary Data (Objects) should be preserved in the Digital Archive with persistent identifiers (Trusted Digital Repository) ● Secondary Data will be stored in the research infrastructure with keeping data versioning and provenance information ● Linked Open Data Cloud (LOD) will provide the layer of interoperability ● Reuse of research data should follow FAIR principles – all research data should be Findable, Accessible, Interoperable and Reusable
  • 7. 7 DANS-KNAW is one of worldwide leaders in FAIR Data (FAIRsFAIR)
  • 8. Collaboration and best practice data sharing Merce Crosas, “Harvard Data Commons”
  • 9. We need a horizontal platform to surve vertical teams Source: CoronaWhy organization 9
  • 10. What is Dataverse? ● Open source data repository developed by IQSS of Harvard University ● Great product with very long history (from 2006) created by experienced and Agile development team ● Clear vision and understanding of research communities requirements, public roadmap ● Well developed architecture with rich APIs allows to build application layers around Dataverse ● Strong community behind of Dataverse is helping to improve the basic functionality and develop it further. ● DANS-KNAW is leading SSHOC WP5.2 task to deliver production ready Dataverse repository for the European Open Science Cloud (EOSC) communities CESSDA, CLARIN and DARIAH.
  • 11. Federated Dataverse data repositories worldwide Source: Merce Crosas, Harvard Data Commons
  • 12. Benefits of the Common Data Infrastructure ● It’s distributed and sustainable, suitable for the future ● maintenance costs will drop massively, as more organizations will join, less expensive it will be to support ● maintenance costs could be reallocated to the training and further development of the new (common) features ● reuse of the same infrastructure components will enforce the quality and the speed of the knowledge exchange ● building a multidisciplinary teams reusing the same infra can bring us new insights and unexpected views ● Common Data Infrastructure plays a role of the universal gravitation layer for Data Science projects (and so on…)
  • 13. DataverseNL collaborative data network Source: https://www.dataverse.nl
  • 14. Data Stations - Future Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 15. Open vs Closed Innovation
  • 16. Coming back to FAIR principles Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 17. Interoperability in the European Open Science Cloud (EOSC) ● Technical interoperability defined as the “ability of different information technology systems and software applications to communicate and exchange data”. It should allow “to accept data from each other and perform a given task in an appropriate and satisfactory manner without the need for extra operator intervention”. ● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data”. ● Organisational interoperability refers to the “way in which organisations align their business processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial goals. Focus on the requirements of the user community by making services available, easily identifiable, accessible and user-focused”. ● Legal interoperability covers “the broader environment of laws, policies, procedures and cooperation agreements” Source: EOSC Interoperability Framework v1.0
  • 18. Our goals to increase interoperability on the global scale Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and external controlled vocabularies: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies All contributions should go to Dataverse source code and available worldwide!
  • 19. The importance of standards and ontologies Generic controlled vocabularies to link metadata in the bibliographic collections are well known: ORCID, GRID, GeoNames, Getty. Medical knowledge graphs powered by: ● Biological Expression Language (BEL) ● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH) ● Wikidata (Open ontology) - Wikipedia Integration based on metadata standards: ● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI) The most of prominent ontologies already available as a Web Services with API endpoints. 19
  • 20. SKOSMOS framework to discover ontologies 20 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 21. The same metadata field linked to many ontologies Language switch in Dataverse will change the language of suggested terms!
  • 22. Use case: COVID-19 expert questions 22 Source: Epidemic Questions Answering “In response to the COVID-19 pandemic, the Epidemic Question Answering (EPIC-QA) track challenges teams to develop systems capable of automatically answering ad-hoc questions about the disease COVID-19, its causal virus SARS-CoV-2, related corona viruses, and the recommended response to the pandemic. While COVID-19 has been an impetus for a large body of emergent scientific research and inquiry, the response to COVID-19 raises questions for consumers.”
  • 23. COVID-19 questions in SKOSMOS framework 23
  • 24. COVID-19 questions in Dataverse metadata 24 Source: COVID-19 European data hub in Harvard Dataverse ● COVID-19 ontologies can be hosted by SKOSMOS framework ● Researchers can enrich metadata by adding standardized questions provided by SKOSMOS ontologies ● rich metadata exported back to Linked Open Data Cloud to increase a chance to be found ● enriched metadata can be used for further ML models training
  • 25. Weblate as a localization service ● Localization/internalization is the key in Large Scale projects ● Weblate allows to translate content in a structured way ● several options for project visibility: accept translations by the crowd, or only give access to a select group of translators. ● Weblate indicates untranslated strings, strings with failing checks, and strings that need approval. ● when new strings are added, Weblate can indicate which strings are new and untranslated. Visit http://translation.cessda.eu
  • 26. Dataverse App Store Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML Interoperability: external controlled vocabularies Data processing: NESSTAR DDI migration tool Linked Data: RDF compliance including SPARQL endpoint (FDP) Federated login: eduGAIN, PIONIER ID CLARIN Switchboard integration: Natural Language Processing tools Visualization tools (maps, charts, timelines)
  • 28. Dataverse and CLARIN tools integration
  • 29. Using Artificial Intelligence and Machine Learning 1300+ people registered in the organization 29
  • 30. Historically most of datasets preserved in data silos (archives), not interlinked and lacking of standardization. There are cultural, structural and technological challenges. Solutions: ● Integrating Linked Data and Semantic Web technologies, forcing research communities to share data and add more interoperability following FAIR principles ● Create a standardized (meta)data layer for Large Scale projects like Time Machine and CoronaWhy ● Working on the automatic metadata linkage to ontologies and external controlled vocabularies Supporting Semantic Web solutions
  • 31. Why Artificial Intelligence? Human resources are very expensive and deficit, it’s difficult to find appropriate expertise in-house. Solution: ● Building AI/ML pipelines for the automatic metadata enrichment and linkage prediction ● applying NLP for NER, data mining, topic classification etc ● building multidisciplinary knowledge graphs should facilitate the development of new projects with economic and social scientists, they will take ownership of their own data if they see value (Clio Infra)
  • 32. Using modern NLP frameworks (spacy, for example) Recognised entities can form a metadata layer and stored in Dataverse
  • 33. How to control Artificial Intelligence Problem: It’s naive to fully trust Machine Learning and AI, we need to support a “human in the loop” processes to take a control over automatic workflows. Ethics is also important, fake detection problem. Solution: A lot of “human in the loop” tools already developed in research projects, we need to support the best for the different use cases, add the appropriate maturity, for example, with CI/CD and introduce them to research communities.
  • 34. Human in the loop General blueprint for a human-in-the-loop interactive AI system. Credits: Stanford University HAI “how do we build a smarter system?” to “how do we incorporate useful, meaningful human interaction into the system?”
  • 35. Semantic chatbot - ontology lookup service 35 Source: Semantic Bot
  • 36. Hypothes.is annotations as a peer review service 1. AI pipeline does domain specific entities extraction and ranking of relevant CORD-19 papers. 2. Automatic entities and statements will be added, important fragments should be highlighted. 3. Human annotators should verify results and validate all statements. 36
  • 37. Doccano annotation with Machine Learning Source: Doccano Labs
  • 38. Metrics and integration with Apache Superset Source: Apache Superset (Open Source)
  • 39. BeFAIR Open Science Framework Visit https://github.com/CoronaWhy/befair Basic Infrastructure of COVID-19 Museum
  • 40. Thank you! Questions? Slava Tykhonov DANS-KNAW vyacheslav.tykhonov@dans.knaw.nl