Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
Paper Presented during International Conference on What’s next in libraries? Trends, Space, and partnerships held during January 21-23, 2015 at NIT Silchar, Assam. It is being jointly organized by NIT Silchar, in association with its USA partner the Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign.
Introduction to Persistent Identifiers| www.eudat.eu | EUDAT
This document provides an introduction to persistent identifiers (PIDs) and their use in the EUDAT system. It defines PIDs as globally unique identifiers that can be used to persistently identify digital objects. The document discusses why PIDs are useful, describing problems with URLs like link rot. It then covers different PID systems like Handle and DOI, as well as EUDAT's use of Handle through the B2HANDLE service. The document also discusses PID policies, use cases, and the B2HANDLE Python library for programmatic PID management.
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
This document provides an overview of metadata standards, including their purpose and types. It describes the MARC 21 and Dublin Core metadata standards in detail. MARC 21 is the predominant bibliographic standard, with formats for bibliographic data, holdings, and authority data. It exists in both MARC 21 and MARCXML syntaxes. Dublin Core is a simpler standard for resource discovery with 15 basic elements. It includes both simple and qualified versions with controlled vocabularies. The document lists several metadata standards and development organizations.
Presentació de la Gestió de Dades de Recerca (RDM) a CORA, la Catalan Open Research Area, a la Universitat Internacional de Catalunya (UIC), el 12 de juliol de 2022.
This document discusses the components and technologies of digital libraries. It describes the key components as selection and acquisition, organization through metadata assignment, indexing and storage in a repository, and search and retrieval via a digital library website. It then associates various technologies with these components, such as metadata standards, document formats, repository systems like DSpace and Fedora, and semantic technologies.
This is a power-point about Networking and Resource Sharing in Library and Information Services: the case study of Consortium Building
Prepared By: May Joyce M. Dulnuan
Paper Presented during International Conference on What’s next in libraries? Trends, Space, and partnerships held during January 21-23, 2015 at NIT Silchar, Assam. It is being jointly organized by NIT Silchar, in association with its USA partner the Mortenson Center for International Library Programs, University of Illinois at Urbana-Champaign.
Introduction to Persistent Identifiers| www.eudat.eu | EUDAT
This document provides an introduction to persistent identifiers (PIDs) and their use in the EUDAT system. It defines PIDs as globally unique identifiers that can be used to persistently identify digital objects. The document discusses why PIDs are useful, describing problems with URLs like link rot. It then covers different PID systems like Handle and DOI, as well as EUDAT's use of Handle through the B2HANDLE service. The document also discusses PID policies, use cases, and the B2HANDLE Python library for programmatic PID management.
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
This document provides an overview of metadata standards, including their purpose and types. It describes the MARC 21 and Dublin Core metadata standards in detail. MARC 21 is the predominant bibliographic standard, with formats for bibliographic data, holdings, and authority data. It exists in both MARC 21 and MARCXML syntaxes. Dublin Core is a simpler standard for resource discovery with 15 basic elements. It includes both simple and qualified versions with controlled vocabularies. The document lists several metadata standards and development organizations.
Presentació de la Gestió de Dades de Recerca (RDM) a CORA, la Catalan Open Research Area, a la Universitat Internacional de Catalunya (UIC), el 12 de juliol de 2022.
This document discusses the components and technologies of digital libraries. It describes the key components as selection and acquisition, organization through metadata assignment, indexing and storage in a repository, and search and retrieval via a digital library website. It then associates various technologies with these components, such as metadata standards, document formats, repository systems like DSpace and Fedora, and semantic technologies.
This is a power-point about Networking and Resource Sharing in Library and Information Services: the case study of Consortium Building
Prepared By: May Joyce M. Dulnuan
National Education Policy and role of LibrariesDr Trivedi
The document discusses India's new National Education Policy (NEP) and the role of libraries. It notes that the NEP aims to provide universal access to quality education through digital technologies like e-learning and online learning. It emphasizes that libraries are important to support curriculum and research. Academic libraries must have digital collections in multiple languages and formats. The NEP recognizes leveraging technology while addressing equity and access issues. Librarians should focus collections and lessons on developing skills like critical thinking, problem solving, and digital/information literacy.
Challenges and opportunities for academic librarieslisld
Research and learning behaviors are changing in a network environment. What challenges do Academic libraries face? What opportunities do they have? A presentation given at a symposium on the future of academic libraries at the Open University.
The document discusses emerging trends in libraries, including virtual reality, social media, bleeding-edge translation technologies, media labs, video streaming, artificial intelligence, digital interfaces for printed books, blockchain technology, the internet of things, drones, and cloud computing. Virtual reality and translation technologies are allowing new immersive experiences for library users, while social media, media labs, and video streaming are enhancing access to content. Emerging technologies like artificial intelligence, digital books, blockchain, IoT, and drones provide new opportunities, while cloud computing expands storage and access to library resources.
DESIDOC is the Defence Scientific Information and Documentation Centre established in 1970 in Delhi, India. It operates under the Defence Research and Development Organization (DRDO) to disseminate science and technology information on cutting edge defence technologies. DESIDOC's vision is to be a centre of excellence for knowledge management in DRDO. It provides library resources and access to databases for DRDO headquarters and laboratories. DESIDOC conducts training programs and publishes various periodicals related to defence research.
The document discusses the Metadata Encoding and Transmission Standard (METS), which is an XML schema for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. It describes the characteristics and sections of a METS file, including the header, descriptive and administrative metadata, file and structural map sections. Current users of METS are also listed, such as libraries and universities. The purpose of METS is to provide a flexible structure for linking metadata and content about digital objects.
Hans Peter Luhn was a computer scientist at IBM who created the Luhn algorithm and developed methods for information retrieval. He proposed counting term occurrences in documents to determine relevance. Luhn and his associates produced early keyword-in-context (KWIC) indexes, where each keyword occurrence was displayed in a list with surrounding words. The American Chemical Society adopted KWIC indexing in 1961. KWIC indexes arrange entries alphabetically based on keywords extracted from document titles, with the keyword centered and a location code at the end. While easy to generate automatically, KWIC indexes lack terminology control and some titles can be misleading.
This document provides an introduction to linked data and open data. It discusses the evolution of the web from documents to interconnected data. The four principles of linked data are explained: using URIs to identify things, making URIs accessible, providing useful information about the URI, and including links to other URIs. The differences between open data and linked data are outlined. Key milestones in linked government data are presented. Formats for publishing linked data like RDF and SPARQL are introduced. Finally, the 5 star scheme for publishing open data as linked data is described.
A presentation by Dr. Shailendra Kumar, Delhi University, during National Workshop on Library 2.0: A Global Information Hub, Feb 5-6, 2009 at PRL Ahmedabad
This document provides an overview of metadata and discusses its various types and uses. It defines metadata as data that describes other data, similar to street signs or maps that communicate information. There are three main types of metadata: descriptive, structural, and administrative. Descriptive metadata is used to describe resources for discovery and identification, structural metadata defines relationships between parts of a resource, and administrative metadata provides technical and management information. The document provides many examples of metadata usage and notes that metadata is key to the functioning of libraries, the web, software, and more. It is truly everywhere.
This document discusses web-scale discovery services (WDS), including what they are, their key features and benefits, examples of major WDS providers, and considerations for implementation. Specifically:
- WDS allows users to search a library's entire collection through a single search box, ranking results based on relevancy across sources. This is presented as an improvement over federated search.
- Major WDS providers discussed include EBSCO Discovery Service, Ex Libris Primo, Serials Solutions Summon, and OCLC's WorldCat Local.
- A comparison of these providers shows they index a variety of content like the library catalog, e-books, journals, and more.
- The
One nation One Subscription journal-access plan of IndiaRangoli Awasthi
The government will negotiate with the world’s leading scientific publishers to set up a nationwide “One nation one subscription” journal-access plan to make the scholarly literature available to everyone for free which is currently limited to the scholars of individual institutions subscribing to it.
Presented at the 2018 LRCN National Workshop on
Electronic Resource Management Systems in Libraries,
held at the University of Nigeria, Nsukka, Enugu State, Nigeria
DSpace is an open source digital repository software package typically used to create open access repositories for scholarly content. It can store any digital media type and is optimized for text-based files. DSpace uses a Java platform with a PostgreSQL or Oracle database and has features like full-text search, persistent identifiers, and the ability to handle any file type. The community development model is open source under a BSD license.
Digital Humanities: Role of Librarians and Libraries. The use of digital evidence & methods digital authoring, publishing, digital curation and preservation, digital use and reuse of scholarship.
This ppt is mainly for library professionals and digital humanities cohorts
The document discusses interoperability in digital libraries. It describes how digital libraries aim to support interoperability at three levels: data gathering, harvesting, and federation. It also discusses protocols used for interoperability such as OAI-PMH, DCMES, and LDAP. OAI-PMH allows harvesting of metadata using the OAI-PMH protocol, while DCMES defines a set of 15 elements for resource description. LDAP enables locating resources on a network.
NISCAIR was formed in 2002 by merging NISCOM and INSDOC to disseminate science and technology information. It provides various information services including publishing journals, conducting training programs, operating an online periodical repository, and managing databases. NISCAIR aims to be the prime custodian of science and technology knowledge in India and promote communication through traditional and modern means.
RDA is a set of guidelines for cataloging digital resources that is based on FRBR and FRAD models. It addresses shortcomings of AACR2 for describing online resources. The RDA Toolkit provides the full RDA instructions and tools like mappings, workflows and an element set to support efficient RDA implementation. It is maintained by the RDA Steering Committee and aims to produce robust data that clearly defines relationships for discovery of resources in libraries, archives and other cultural heritage organizations.
Apache Atlas provides metadata services and a centralized metadata repository for Hadoop platforms. It aims to enable data governance across structured and unstructured data through hierarchical taxonomies. Upcoming features include expanded dataset lineage tracking and integration with Apache Kafka and Ranger for dynamic access policy management. Challenges of big data management include scaling traditional tools to handle large volumes of entities and metadata, and Atlas addresses this through its decentralized and metadata-driven approach.
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
National Education Policy and role of LibrariesDr Trivedi
The document discusses India's new National Education Policy (NEP) and the role of libraries. It notes that the NEP aims to provide universal access to quality education through digital technologies like e-learning and online learning. It emphasizes that libraries are important to support curriculum and research. Academic libraries must have digital collections in multiple languages and formats. The NEP recognizes leveraging technology while addressing equity and access issues. Librarians should focus collections and lessons on developing skills like critical thinking, problem solving, and digital/information literacy.
Challenges and opportunities for academic librarieslisld
Research and learning behaviors are changing in a network environment. What challenges do Academic libraries face? What opportunities do they have? A presentation given at a symposium on the future of academic libraries at the Open University.
The document discusses emerging trends in libraries, including virtual reality, social media, bleeding-edge translation technologies, media labs, video streaming, artificial intelligence, digital interfaces for printed books, blockchain technology, the internet of things, drones, and cloud computing. Virtual reality and translation technologies are allowing new immersive experiences for library users, while social media, media labs, and video streaming are enhancing access to content. Emerging technologies like artificial intelligence, digital books, blockchain, IoT, and drones provide new opportunities, while cloud computing expands storage and access to library resources.
DESIDOC is the Defence Scientific Information and Documentation Centre established in 1970 in Delhi, India. It operates under the Defence Research and Development Organization (DRDO) to disseminate science and technology information on cutting edge defence technologies. DESIDOC's vision is to be a centre of excellence for knowledge management in DRDO. It provides library resources and access to databases for DRDO headquarters and laboratories. DESIDOC conducts training programs and publishes various periodicals related to defence research.
The document discusses the Metadata Encoding and Transmission Standard (METS), which is an XML schema for encoding descriptive, administrative, and structural metadata regarding objects within a digital library. It describes the characteristics and sections of a METS file, including the header, descriptive and administrative metadata, file and structural map sections. Current users of METS are also listed, such as libraries and universities. The purpose of METS is to provide a flexible structure for linking metadata and content about digital objects.
Hans Peter Luhn was a computer scientist at IBM who created the Luhn algorithm and developed methods for information retrieval. He proposed counting term occurrences in documents to determine relevance. Luhn and his associates produced early keyword-in-context (KWIC) indexes, where each keyword occurrence was displayed in a list with surrounding words. The American Chemical Society adopted KWIC indexing in 1961. KWIC indexes arrange entries alphabetically based on keywords extracted from document titles, with the keyword centered and a location code at the end. While easy to generate automatically, KWIC indexes lack terminology control and some titles can be misleading.
This document provides an introduction to linked data and open data. It discusses the evolution of the web from documents to interconnected data. The four principles of linked data are explained: using URIs to identify things, making URIs accessible, providing useful information about the URI, and including links to other URIs. The differences between open data and linked data are outlined. Key milestones in linked government data are presented. Formats for publishing linked data like RDF and SPARQL are introduced. Finally, the 5 star scheme for publishing open data as linked data is described.
A presentation by Dr. Shailendra Kumar, Delhi University, during National Workshop on Library 2.0: A Global Information Hub, Feb 5-6, 2009 at PRL Ahmedabad
This document provides an overview of metadata and discusses its various types and uses. It defines metadata as data that describes other data, similar to street signs or maps that communicate information. There are three main types of metadata: descriptive, structural, and administrative. Descriptive metadata is used to describe resources for discovery and identification, structural metadata defines relationships between parts of a resource, and administrative metadata provides technical and management information. The document provides many examples of metadata usage and notes that metadata is key to the functioning of libraries, the web, software, and more. It is truly everywhere.
This document discusses web-scale discovery services (WDS), including what they are, their key features and benefits, examples of major WDS providers, and considerations for implementation. Specifically:
- WDS allows users to search a library's entire collection through a single search box, ranking results based on relevancy across sources. This is presented as an improvement over federated search.
- Major WDS providers discussed include EBSCO Discovery Service, Ex Libris Primo, Serials Solutions Summon, and OCLC's WorldCat Local.
- A comparison of these providers shows they index a variety of content like the library catalog, e-books, journals, and more.
- The
One nation One Subscription journal-access plan of IndiaRangoli Awasthi
The government will negotiate with the world’s leading scientific publishers to set up a nationwide “One nation one subscription” journal-access plan to make the scholarly literature available to everyone for free which is currently limited to the scholars of individual institutions subscribing to it.
Presented at the 2018 LRCN National Workshop on
Electronic Resource Management Systems in Libraries,
held at the University of Nigeria, Nsukka, Enugu State, Nigeria
DSpace is an open source digital repository software package typically used to create open access repositories for scholarly content. It can store any digital media type and is optimized for text-based files. DSpace uses a Java platform with a PostgreSQL or Oracle database and has features like full-text search, persistent identifiers, and the ability to handle any file type. The community development model is open source under a BSD license.
Digital Humanities: Role of Librarians and Libraries. The use of digital evidence & methods digital authoring, publishing, digital curation and preservation, digital use and reuse of scholarship.
This ppt is mainly for library professionals and digital humanities cohorts
The document discusses interoperability in digital libraries. It describes how digital libraries aim to support interoperability at three levels: data gathering, harvesting, and federation. It also discusses protocols used for interoperability such as OAI-PMH, DCMES, and LDAP. OAI-PMH allows harvesting of metadata using the OAI-PMH protocol, while DCMES defines a set of 15 elements for resource description. LDAP enables locating resources on a network.
NISCAIR was formed in 2002 by merging NISCOM and INSDOC to disseminate science and technology information. It provides various information services including publishing journals, conducting training programs, operating an online periodical repository, and managing databases. NISCAIR aims to be the prime custodian of science and technology knowledge in India and promote communication through traditional and modern means.
RDA is a set of guidelines for cataloging digital resources that is based on FRBR and FRAD models. It addresses shortcomings of AACR2 for describing online resources. The RDA Toolkit provides the full RDA instructions and tools like mappings, workflows and an element set to support efficient RDA implementation. It is maintained by the RDA Steering Committee and aims to produce robust data that clearly defines relationships for discovery of resources in libraries, archives and other cultural heritage organizations.
Apache Atlas provides metadata services and a centralized metadata repository for Hadoop platforms. It aims to enable data governance across structured and unstructured data through hierarchical taxonomies. Upcoming features include expanded dataset lineage tracking and integration with Apache Kafka and Ranger for dynamic access policy management. Challenges of big data management include scaling traditional tools to handle large volumes of entities and metadata, and Atlas addresses this through its decentralized and metadata-driven approach.
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
This document discusses the 5 year evolution of Dataverse, an open source data repository platform. It began as a tool for collaborative data curation and sharing within research teams. Over time, features were added like dataset version control, APIs, and integration with other systems. The document outlines challenges around maintenance and sustainability. It also covers efforts to improve Dataverse's interoperability, such as integrating metadata standards and controlled vocabularies, and making datasets FAIR compliant. The goal is to establish Dataverse as a core component of the European Open Science Cloud by improving areas like software quality, integration with tools, and standardization.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API.
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
The Web of Linked Open Data, or LOD, is the most relevant achievement of the Semantic Web. Initially proposed by Tim Berners-Lee in a seminal paper published in Scientific American in 2001, the Semantic Web envisions a web where software agents can interact with large volumes of structured, easy to process data. It is now when users have at our disposal the first, mature results of this vision. Among them, and probably the most significant ones, are the different LOD initiatives and projects that publish open data in standard formats like RDF.
This presentation provides an overview and comparison of different LOD initiatives in the area of patent information, and analyses potential opportunities for building new information services based on largely available datasets of patent information. Information is based on different interviews conducted with innovation agents and on the analysis of professional bibliography and current implementations.
LOD opportunities are not only restricted to information aggregators, but also to end-users and innovation agents that need to face with the difficulties of dealing with large amounts of data. In both cases, the opportunities offered by LOD need to be assessed, as LOD has just become a standard, universal method to distribute, share and access data.
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Andrea Scharnhorst
Presentation given at ISKO UK: research observatory, November 24, 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Vyacheslav Tykhonov, Jerry de Vries, Eko Indarto, Femmy Admiraal, Mike Priddy, and Andrea Scharnhorst: Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository
Abstract:
The development of metadata schemes in data repositories (and other content providers) has always been a process of negotiation between the needs of the designated user communities and the content of the collection on the one side and standards developed in the field. Automatisation has both enabled and enforced standardisation and alignment of metadata schemes (see as an example). But, while designated user communities turned from being local users to global ones (due to web services), their specific needs have not vanished. Technology offers possibilities to give the aforementioned negotiation a new form. In this presentation, we present the Dataverse platform, used by many data repositories. We show - using the case of the CMDI metadata and the CLARIN (Common Language Resources and Technology Infrastructure)community - how the Dataverse common core set of metadata called Citation Block can be extended with custom fields defined as a discipline specific metadata block. In particular, we show how these custom fields can be connected to a distributed network of authoritative controlled vocabularies. So, that at the end semantic search is possible. The presentation highlights opportunities and challenges, based on our own experiences. Related work has been presented at the CLARIN Annual Conference 2021 (see Proceedings).
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
Building COVID-19 Museum as Open Science Projectvty
This document discusses building a COVID-19 Museum as an open science project. It describes the speaker's background working on various data management projects. It discusses moving towards open science and sharing data according to FAIR principles. It outlines the Time Machine project for digitizing historical documents and its approach to data management. The rest of the document discusses using the Dataverse platform to build repositories, linking metadata to ontologies, using tools like Weblate for translations, and exploring the use of artificial intelligence and machine learning to enhance metadata and facilitate human-in-the-loop review processes.
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
This document discusses linking together data, services, and things to support a collaborative data management facility for a wind characterization scientific study. It proposes using semantic technologies like REST, Linked Open Data, Linked Services, and concepts from the Internet of Things. The approach aims to seamlessly link the study's instruments, services, activities, and data to gain insights and make everything accessible and discoverable for researchers. It leverages existing open-source and commercial tools and illustrates how a linked knowledge environment can support search and discovery across components for both facility operations and scientists using the study results.
This document discusses metadata, which is structured data that describes and helps manage information resources. There are different types of metadata including descriptive, structural, and administrative. Metadata serves important functions like allowing resources to be discovered and organized. Several metadata standards are discussed, including Dublin Core, METS, MODS, EAD, and LOM. The document also covers metadata creation, quality issues, and ways metadata can be improved.
The document discusses semantic mapping in CLARIN Component Metadata Infrastructure (CMDI). CMDI allows flexible yet semantically interoperable metadata descriptions through the use of explicit schemas and semantic registries like ISOcat and RelationRegistry. These registries define concepts and relationships that can be shared across metadata profiles and elements. Semantic mapping helps achieve recall and disambiguation in metadata searches across the diverse set of CMDI profiles and components.
Semantics in Financial Services -David NewmanPeter Berger
David Newman serves as a Senior Architect in the Enterprise Architecture group at Wells Fargo Bank. He has been following semantic technology for the last 3 years; and has developed several business ontologies. He has been instrumental in thought leadership at Wells Fargo on the application of Semantic Technology and is a representative of the Financial Services Technology Consortium (FSTC)on the W3C SPARQL Working Group.
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
Similar to Ontologies, controlled vocabularies and Dataverse (20)
Decentralised identifiers for CLARIAH infrastructure vty
Slides of the presentation for CLARIAH community on the ideas how to make controlled vocabularies sustainable and FAIR (Findable, Accessible, Interoperable, Reusable) with the help of Decentralized Identifiers (DIDs).
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
Controlled vocabularies and ontologies in Dataverse data repositoryvty
This document discusses supporting external controlled vocabularies in Dataverse. It proposes implementing a JavaScript interface to allow linking metadata fields to terms from external vocabularies accessed via SKOSMOS APIs. Several challenges are identified, such as applying support to any field, backward compatibility, and ensuring vocabularies come from authoritative sources. Caching concepts and linking dataset files directly to terms are also proposed to improve interoperability.
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
This document summarizes an presentation about automating CI/CD testing, installation, and deployment of Dataverse in the European Open Science Cloud. It discusses using Docker and Kubernetes for deployment, a community-driven QA plan using pyDataverse for test automation, and providing quality assurance as a service. The presentation also covers topics like the CESSDA maturity model, integrating Dataverse on Google Cloud, and using serverless computing for some Dataverse applications and services.
External controlled vocabularies support in Dataversevty
This presentation discusses adding support for external controlled vocabularies to the Dataverse data repository platform. It describes how ontologies like SKOS can be used to represent vocabularies and allow linking metadata fields in Dataverse to terms. The presentation proposes developing a Semantic Gateway plugin for Dataverse that would allow browsing and linking to external vocabularies hosted in the SKOSMOS framework via its API. This could improve metadata by allowing standardized, linked terms and help make data more FAIR.
Dataverse can be deployed using Docker containers to improve maintainability and portability. The document discusses how Docker can isolate applications and their dependencies into portable containers. It provides an example of deploying Dataverse as a set of microservices within Docker containers. Instructions are included on building Docker images, running containers, and managing the containers and images through commands and tools like Docker Desktop, Docker Hub, and Docker Compose.
Technical integration of data repositories status and challengesvty
This document discusses technical integration of data repositories, including:
- Previous integration initiatives focused on metadata integration using OAI-PMH and ResourceSync protocols, as well as aggregators like OpenAIRE.
- Challenges to integration include different levels of software/service maturity, maintenance of distributed applications, and use of common standards and vocabularies.
- Potential integration efforts could focus on improving FAIRness, metadata/data flexibility, and connections between repositories, software, and computing resources to better enable reuse of EOSC data and services.
SSHOC Dataverse in the European Open Science Cloudvty
This project summary covers the SSHOC project which aims to create a social sciences and humanities section of the European Open Science Cloud by maximizing data reuse through open science principles. The project will interconnect existing and new infrastructures through a clustered cloud, establish governance for SSH-EOSC, and provide a research data repository service for SSH institutions through further developing the Dataverse platform on EOSC. The project involves 47 partners across 20 beneficiaries and 27 linked third parties with a budget of €14,455,594.08 over 40 months to achieve these objectives.
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
The document discusses Dataverse, an open source data repository software. It summarizes that Dataverse was developed by Harvard University, has a large community and development team, and is used by many countries as a data repository infrastructure. It then describes the SSHOC Dataverse project which aims to create a multilingual, standardized, and reusable open data infrastructure across several European countries. Finally, it notes that Dataverse is a reliable cloud service that enables FAIR data sharing and can be easily deployed by research organizations.
Data standardization process for social sciences and humanitiesvty
This document discusses data standardization processes at DANS-KNAW. It describes how DANS-KNAW standardizes metadata during data deposit and harvesting through controlled vocabularies. It also discusses how DANS is developing the SSHOC DataverseEU project to standardize metadata across several European countries. The document concludes by emphasizing the importance of tracking provenance information and developing standardization pipelines and services to improve data and metadata access.
This document summarizes the development process for the DataverseSSHOC project. It outlines two parallel development tracks - a core development team modifying the Dataverse core functionality, and an application development team creating new tools. Tasks are managed using Trello. Code is stored in GitHub and BitBucket. The development follows a SCRUM process with Docker images available on Docker Hub and a Kubernetes cluster for deployment. Testing includes unit, integration, performance, and A/B testing to comply with CESSDA maturity standards.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
1. Ontologies, controlled vocabularies
and Dataverse
Slava Tykhonov
Senior Information Scientist,
Research & Innovation (DANS-KNAW)
Dataverse community call, Harvard University, 03.12.2020
2. Overall goals for DANS-KNAW
● DANS-KNAW is running EASY Trusted Digital Repository as a service, it’s
time to get data back from archive, convert and put in Dataverse ready for
curation
● DANS-KNAW wants to run Data Stations with metadata created by and
maintained by different research communities
● the long term goal of DANS is to make all datasets harvestable and
approachable, and create an interoperability layer with external controlled
vocabularies (FAIR Data Point)
4. The importance of standards and ontologies
Generic controlled vocabularies to link metadata in the bibliographic collections are well
known: ORCID, GRID, GeoNames, Getty.
Medical knowledge graphs powered by:
● Biological Expression Language (BEL)
● Medical Subject Headings (MeSH®) by U.S. National Library of Medicine (NIH)
● Wikidata (Open ontology) - Wikipedia
Integration based on metadata standards:
● MARC21, Dublin Core (DC), Data Documentation Initiative (DDI)
The most of prominent ontologies already available as a Web Services with API endpoints.
4
6. Interoperability in EOSC
● Technical interoperability defined as the “ability of different information technology systems and
software applications to communicate and exchange data”. It should allow “to accept data from each
other and perform a given task in an appropriate and satisfactory manner without the need for extra
operator intervention”.
● Semantic interoperability is “the ability of computer systems to transmit data with unambiguous,
shared meaning. Semantic interoperability is a requirement to enable machine computable logic,
inferencing, knowledge discovery, and data”.
● Organisational interoperability refers to the “way in which organisations align their business
processes, responsibilities and expectations to achieve commonly agreed and mutually beneficial
goals. Focus on the requirements of the user community by making services available, easily
identifiable, accessible and user-focused”.
● Legal interoperability covers “the broader environment of laws, policies, procedures and
cooperation agreements”
Source: EOSC Interoperability Framework v1.0
7. Our goals to increase Dataverse interoperability
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and CVs:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
8. Introduction of Data Catalog Vocabulary (DCAT)
Source: W3C DCAT recommendation
DCAT defines three main
classes:
● dcat:Catalog
represents the
catalog
● dcat:Dataset
represents a dataset
in a catalog.
● dcat:Distribution
represents an
accessible form of a
dataset
DCAT makes extensive
use of terms of RDF,
Dublin Core, SKOS, and
other vocabs!
9. Simple Knowledge Organization System (SKOS)
SKOS models a thesauri-like resources:
- skos:Concepts with preferred labels and alternative labels (synonyms) attached to them
(skos:prefLabel, skos:altLabel).
- skos:Concept can be related with skos:broader, skos:narrower and skos:related properties.
- terms and concepts could have more than one broader term and concept.
SKOS allows to create a semantic layer on top of objects, a network with statements and relationships.
A major difference of SKOS is logical “is-a hierarchies”. In thesauri the hierarchical relation can represent
anything from “is-a” to “part-of”.
9
10. RDF graph using the SKOS Core Vocabulary
10Source: SKOS Core Guide
11. Global Research Identifier Database (GRID) in SKOS
11
Can we provide human with
convenient web interface to
create links to data points?
Can we use Machine Learning
algorithms to make a prediction
about links and convert data in
SKOS automatically?
12. Linked Data integration challenges
● datasets are very heterogeneous and multilingual
● data usually lacks sufficient data quality control
● data providers using different modeling schemas and styles
● linked data cleansing and versioning is very difficult to track and maintain
properly, web resources aren’t persistent
● even modern data repositories providing only metadata records describing
data without giving access to individual data items stored in files
● difficult to assign and manually keep up-to-date entity relationships in
knowledge graph
We need semantic relationships among metadata fields and their values!
12
13. What is semantics?
Semantics (from Ancient Greek: σημαντικός sēmantikós, "significant")[a][1] is the study of meaning. The term can be used to
refer to subfields of several distinct disciplines including linguistics, philosophy, and computer science.
Linguistics
In linguistics, semantics is the subfield that studies meaning. Semantics can address meaning at the levels of words,
phrases, sentences, or larger units of discourse. One of the crucial questions which unites different approaches to linguistic
semantics is that of the relationship between form and meaning.[2]
Computer science
In computer science, the term semantics refers to the meaning of language constructs, as opposed to their form (syntax).
According to Euzenat, semantics "provides the rules for interpreting the syntax which do not provide the meaning directly
but constrains the possible interpretations of what is declared."[14]
(from Wikipedia)
15. Dataverse datasetfield API
curl http://localhost:8080/api/admin/datasetfield/title To do list for Dataverse core:
● add TermURI for
metadata fields (DC)
● show external
controlled vocabularies
available for the
specific field
● add multilingual
support with ‘lang’
parameter
21. Semantic Gateway lookup API
Scenario: when user selects vocabulary and search for term, API will get filled
values and returning back the list of concepts in the standardized format:
GET /?lang=language&vocab=vocabulary&term=keyword
examples:
GET /?lang=en&vocab=unesco&query=fam
GET /?vocab=mesh&query=sars
23. Use case: CMDI, hierarchical metadata schema
Some conclusions:
● Top-level concepts (CMDI
components) can share the same
concepts
● Relations between concepts define
metadata schema
● Disambiguation of concepts is
complicated
● Multilingual components have
language indication (for example,
keywords in Dutch)
● Hierarchy defined by semantics
24. Use case: CMDI data model and namespaces
Default namespace added in Semantic Gateway for CMDI schema to keep all relationships
between top-level concepts (metadata fields) in the knowledge graph:
ns.dataverse.org/cmdi_component/cmdi_term
However, a component or element in CMDI has a unique name among its siblings, so:
Source: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data
25. Adding component-specific URIs in SKOS
CMDI Component Registry was created for registered Components/Profiles
Example path in CMDI:
/CMD/Components/corpusProfile/resourceCommonInfo/metadataInfo/metadataCreator/actor
Info/actorType
ns.dataverse.org/cmdi1/metadataCreator skos:broader ns.dataverse.org/cmdi1/actorInfo
or simply: cmdi1:metadataCreator skos:related cmdi1:corpusProfile
CMDI concepts could be linked to the other SKOS concepts on the next step.
26. How can we link CMDI components in SKOS?
Source: CMDI Component Registry
27. Export from Dataverse metadata back to CMDI
Basic requirements:
Dataverse metadata schema should have CMDI metadata that can be extended
by custom components used by CLARIN centers in the different countries.
Original relationships between fields and concepts should be kept, custom
components should be added to SKOS schema.
Users should be able to download metadata in the original CMDI format without
losing quality.
28. The FAIR Signposting Profile
Herbert Van de Sompel,
DANS Chief Innovation Officer
https://hvdsomp.info
Two levels of access to Web resources:
● level one provides a concise set of links or a
minimal set of links by value in the HTTP
header
● level two delivers a complete comprehensive
set of links by reference meaning in a
standalone document (link set)
29. Dataverse meta(data) in FAIR Data Point (FDP)
● RESTful web service that enables data
owners to expose their data sets using
rich machine-readable metadata
● Provides standardized descriptions
(RDF-based metadata) using
controlled vocabularies and ontologies
● FDP spec is public
Source: FDP
The goal is to run FDP on
Dataverse side (DCAT, CVs) and
provide metadata export in RDF!