DataverseNL is a data repository service started in 2014 by DANS as a shared service for 15 Dutch institutions. It currently contains over 200 dataverses and 450 datasets that have been downloaded over 7,000 times. DANS aims to use DataverseNL for ongoing research projects and then archive finalized datasets in its Trusted Digital Repository (TDR) for permanent preservation. DataverseNL serves as a collaboration platform and integration point for sharing research data across Dutch universities and organizations. DANS is working to link DataverseNL metadata to semantic web vocabularies and expose it as linked open data.
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
The document discusses open data and challenges with publishing open data. It introduces Entryscape Catalog as a solution for easily, explicitly, and quickly publishing open data through intuitive interfaces with minimum manual work. Entryscape Catalog allows describing data through standard-based forms, publishing data one item at a time or all at once, uploading existing non-open data, and creating APIs from tabular data with a click.
Wikidata is a free and open knowledge base that can be edited by anyone to store structured data. It currently has over 33.5 million articles and 1.9 billion edits in 287 languages. Wikidata provides structured, collaborative, free, open, multilingual, and referenced data through its API and licenses its data under CC0 to allow easy access and reuse. It helps projects like Wikipedia by providing integrated access to its data and supports smaller languages and communities through micro-contributions. In 2015, Google's Freebase project moved its data to Wikidata, increasing its scope and ecosystem.
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
The document discusses Dataverse, an open source data repository software. It summarizes that Dataverse was developed by Harvard University, has a large community and development team, and is used by many countries as a data repository infrastructure. It then describes the SSHOC Dataverse project which aims to create a multilingual, standardized, and reusable open data infrastructure across several European countries. Finally, it notes that Dataverse is a reliable cloud service that enables FAIR data sharing and can be easily deployed by research organizations.
Slides used to introduce the technical aspects of DSpace-CRIS to the technical staff of the Hamburg University of Technology.
Main topics:
The DSpace-CRIS data model: additional entities, interactions with the DSpace data model (authority framework), enhanced metadata, inverse relationship
ORCID integration & technical details: available features & use cases (authentication, authorization, profile claiming, profile synchronization push & pull, registry lookup), configuration, API-KEY, use of the sandbox, metadata mapping
An Approach for RDF-based Semantic Access to NoSQL Repositories, presented as partial requiremnt for the discipline "Metodologia da Pesquisa em Ciência da Computação" at UFSC/2015
DataverseNL is a data repository service started in 2014 by DANS as a shared service for 15 Dutch institutions. It currently contains over 200 dataverses and 450 datasets that have been downloaded over 7,000 times. DANS aims to use DataverseNL for ongoing research projects and then archive finalized datasets in its Trusted Digital Repository (TDR) for permanent preservation. DataverseNL serves as a collaboration platform and integration point for sharing research data across Dutch universities and organizations. DANS is working to link DataverseNL metadata to semantic web vocabularies and expose it as linked open data.
Slides from my workshop at Open Repositories 2016 about DSpace's Linked Data support. The slides include a short introduction into the Semantic Web and Linked Data, the main ideas behind the Linked Data support of DSpace, information on how to configure this feature and some examples about how to query DSpace installations for Linked Data.
The document discusses open data and challenges with publishing open data. It introduces Entryscape Catalog as a solution for easily, explicitly, and quickly publishing open data through intuitive interfaces with minimum manual work. Entryscape Catalog allows describing data through standard-based forms, publishing data one item at a time or all at once, uploading existing non-open data, and creating APIs from tabular data with a click.
Wikidata is a free and open knowledge base that can be edited by anyone to store structured data. It currently has over 33.5 million articles and 1.9 billion edits in 287 languages. Wikidata provides structured, collaborative, free, open, multilingual, and referenced data through its API and licenses its data under CC0 to allow easy access and reuse. It helps projects like Wikipedia by providing integrated access to its data and supports smaller languages and communities through micro-contributions. In 2015, Google's Freebase project moved its data to Wikidata, increasing its scope and ecosystem.
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
The document discusses Dataverse, an open source data repository software. It summarizes that Dataverse was developed by Harvard University, has a large community and development team, and is used by many countries as a data repository infrastructure. It then describes the SSHOC Dataverse project which aims to create a multilingual, standardized, and reusable open data infrastructure across several European countries. Finally, it notes that Dataverse is a reliable cloud service that enables FAIR data sharing and can be easily deployed by research organizations.
Slides used to introduce the technical aspects of DSpace-CRIS to the technical staff of the Hamburg University of Technology.
Main topics:
The DSpace-CRIS data model: additional entities, interactions with the DSpace data model (authority framework), enhanced metadata, inverse relationship
ORCID integration & technical details: available features & use cases (authentication, authorization, profile claiming, profile synchronization push & pull, registry lookup), configuration, API-KEY, use of the sandbox, metadata mapping
An Approach for RDF-based Semantic Access to NoSQL Repositories, presented as partial requiremnt for the discipline "Metodologia da Pesquisa em Ciência da Computação" at UFSC/2015
TrunkDB is the new cloud-based version of ORDs (Oxford Research Database Service), which was originally designed to provide database hosting and manipulation services for researchers. TrunkDB allows researchers to create multiple versions of databases, share data with colleagues, and access data securely from anywhere through an online interface. It aims to support researchers by treating their data, rather than just the database, as the primary object and allowing various ways of organizing, updating, and viewing data over time through a versioning system. TrunkDB is currently in private beta testing with plans to launch publicly in June.
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...4Science
DSpace-CRIS is an extended version of DSpace that offers a powerful and flexible data model to describe not only publications but all research entities and their relationships. DSpace-CRIS 7 will feature a new Angular UI and REST API in addition to functionality for compliance with OpenAire, integrating publications from external sources, bidirectional ORCID integration, and synchronizing with other systems. DSpace-CRIS also extends data modeling capabilities and provides tools for data quality, metadata management, and extensibility.
This document discusses exposing Nobel Prize data as linked open data. It describes a two phase approach: 1) exposing the data externally to spread information and enable other apps, and 2) using the linked data internally to improve data quality and enhance webpages. It provides details on publishing the dataset, interlinking it with other datasets, and technical implementations like a SPARQL endpoint and linked data cache. The goal is to increase the value of Nobel Prize information for their organization and audiences while also contributing to the larger linked open data cloud.
DSP3B: DSpace Interest Group 3B: DSpace-CRIS Workshop · 11/Jun/2015: 3:30pm-5:00pm · Location: Regency E
DSpace-CRIS Workshop
Andrea Bollini, Luigi Andrea Pascarelli, Michele Mennielli, David Palmer
Cineca, Italy; Hong Kong University
The 90-minute workshop will introduce attendees to the latest version of the DSpace-CRIS module, covering its functional and technical aspects.
DSpace-CRIS is an additional open-source module for the DSpace platform. It extends the DSpace data model providing the ability to manage, collect and expose data about any entities of the research domain, such as people, organizational units, projects, grants, awards, patents, publications, and so on. Before OR2015 a new version of the system will be released to follow the new DSpace 5.0 version. The new version contains, among other things, important enhancements of its integration with ORCID.
The DSpace-CRIS extensible data model will be explained in depth, through examples and discussion with participants.
Other main topics are DSpace-CRIS "components", management of relationships and network analysis functionalities.
At the end of the workshop, participants will be able to:
- understand the DSpace-CRIS data model
- evaluate if DSpace-CRIS fits the requirements of their institution
- use the DSpace-CRIS User Interface
- change the default configuration, adapting it to a specific data model.
Preseted at OR2017 - Brisbane
Panel Discussion: COAR Next Generation Repositories: Results and Recommendations
The presentation focus on the recommended technologies to implement in Repository platforms
The nearly ubiquitous deployment of repository systems in higher education and research institutions provides the foundation for a distributed, globally networked infrastructure for scholarly communication. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the Web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices.
To that end, in April 2016, COAR launched a working group to identify the technologies and architectures of the next generation of repositories. There are two threads to our work: (1) increase the exposure by repositories of uniform behaviors that can be used by machine agents to fuel novel scholarly applications that reach beyond the scope of a single repository and that enable to smoothly embed repository content into mainstream web applications. (2) integrate with existing scholarly infrastructures, specifically those aimed at identification, as a means to solidly embed repositories in the overall scholarly communication landscape.
This panel will present the results of the COAR Next Generation Repositories Working Group including our vision, design assumptions, use cases, architectural and technical recommendations, and next steps. The session will also include time for audience discussion and feedback.
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
Here I motivate the need for improving the ways in which we access the integration favorable space of Linked Data, by bridging the gap between various Linked Data querying methods, getting links to these queries, and providing RESTful APIs based on them.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
The document outlines the vision, mission, and strategy of the STFC (Science and Technology Facilities Council) in implementing e-Science technologies. The goals are to exploit data from STFC facilities through innovative infrastructure, integrate activities nationally and internationally, and improve computation and data management capabilities to enable new scientific discoveries.
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...Andrea Bollini
The idea of DSpace-CRIS has its origin in 2009 when the Hong Kong University decided to extend the information exposed in their DSpace IR adding information (people/projects) coming from other systems already in use (mainly) for administrative purpose: a CRIS.
One year ago, November 2012, DSpace-CRIS was released as an open source solution to enrich DSpace (1.8.2). After highlighting the important steps made by the DSpace Community in 2013, that will bring to the final release of DSpace 4.0 in December, Cineca focused its presentation on what DSpace-CRIS is today.
The most important announcement was that DSpace-CRIS is now compatible and compliant with the CERIF standard and that an export feature in CERIF XML will be available in the DSpace-CRIS 4.0 version. Indeed the key components of the CERIF data model are supported natively: UUID, timestamped relation, semantic characterization.
In addition to that, the dynamic, flexible and not hardcoded approach of DSpace-CRIS data model makes it very easy to create new entities (besides the few predefined ones) and configure instances compliant with CERIF.
There are several advantages that DSpace-CRIS brings to Institutional Repositories and to the DSpace community overall:
- CRIS entities as authority for Item metadata values;
- DSpace Items can be linked and displayed in the detail page of any CRIS entities;
- Ability to display selected publications (or any other related entities) in the researcher profile;
- It is possible to create lists of selected publications (or any other related entities);
- CRIS entity detailed page visit;
- Global & Top related CERIF Entity views & downloads referencing the CRIS entity (projects for researchers, researchers for OrgUnits, etc.);
- Global & Top item views & downloads referencing the CRIS entity;
- email and RSS alerts;
- Article level metrics for PubMed (extensible):
- Cited-by count in the item page
- Number of articles for researcher
- Total citations for researcher (only items in local DSpace database will be counted)
This document discusses APIs and the API economy. It defines an API as an interface between software systems that allows them to interact. APIs provide programmatic access to systems and processes within organizations. Building APIs improves digital ecosystems by enabling data sharing, reuse, and rapid prototyping. The document advocates for an open innovation approach where organizations use both internal and external knowledge through APIs to accelerate innovation. It presents a vision of APIs managing complex systems and data as products that fuel collaboration across communities.
This document provides an overview of the ResourceSync framework for synchronizing web resources between a source and destinations. It describes the key capabilities a source can provide, including describing available content through resource lists and dumps, describing changes through change lists and dumps, and archiving capability documents. Destinations need baseline and incremental synchronization, and the ability to audit synchronization status. Use cases demonstrate the need for high-volume, low-latency synchronization between sources like arXiv and DBpedia. The framework supports modular capabilities that destinations can use selectively for efficient synchronization aligned with web standards.
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
This presentation as been used to start the pilot phase of the OpenAIRE Advance' funded implementation project in DSpace-CRIS.
DSpace-CRIS now provide support for the OpenAIRE guidelines for CRIS manager in addition to the previous already supported guidelines for Literature Repository and DataArchive
Nanopublications and Decentralized PublishingTobias Kuhn
1) Current methods of publishing and sharing research results and data pose problems regarding verifiability, immutability, and permanence over time.
2) Nanopublications use cryptographic hashes to create "Trusty URIs" that make digital objects verifiable, immutable, and permanent by linking identifiers to content.
3) A decentralized network of nanopublication servers allows for open, real-time publishing and retrieval of nanopublications without a central authority through propagation across nodes.
This document provides an introduction and overview of the ORCID API. It discusses the ORCID data model, security model, XML structure, and how to access and use the API. The ORCID API uses OAuth2 for authorization and allows developers to retrieve, update, and add researcher profiles and activities like publications, affiliations, and funding through API calls. Documentation and support resources are also referenced.
Linked Data Notifications Distributed Update Notification and Propagation on ...Aksw Group
Distributed Update Notification and Propagation on the Web of Data with: Linked Data Notifications, PubSubHubbub, Semantic Pingback and Structured Feedback
TrunkDB is the new cloud-based version of ORDs (Oxford Research Database Service), which was originally designed to provide database hosting and manipulation services for researchers. TrunkDB allows researchers to create multiple versions of databases, share data with colleagues, and access data securely from anywhere through an online interface. It aims to support researchers by treating their data, rather than just the database, as the primary object and allowing various ways of organizing, updating, and viewing data over time through a versioning system. TrunkDB is currently in private beta testing with plans to launch publicly in June.
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...4Science
DSpace-CRIS is an extended version of DSpace that offers a powerful and flexible data model to describe not only publications but all research entities and their relationships. DSpace-CRIS 7 will feature a new Angular UI and REST API in addition to functionality for compliance with OpenAire, integrating publications from external sources, bidirectional ORCID integration, and synchronizing with other systems. DSpace-CRIS also extends data modeling capabilities and provides tools for data quality, metadata management, and extensibility.
This document discusses exposing Nobel Prize data as linked open data. It describes a two phase approach: 1) exposing the data externally to spread information and enable other apps, and 2) using the linked data internally to improve data quality and enhance webpages. It provides details on publishing the dataset, interlinking it with other datasets, and technical implementations like a SPARQL endpoint and linked data cache. The goal is to increase the value of Nobel Prize information for their organization and audiences while also contributing to the larger linked open data cloud.
DSP3B: DSpace Interest Group 3B: DSpace-CRIS Workshop · 11/Jun/2015: 3:30pm-5:00pm · Location: Regency E
DSpace-CRIS Workshop
Andrea Bollini, Luigi Andrea Pascarelli, Michele Mennielli, David Palmer
Cineca, Italy; Hong Kong University
The 90-minute workshop will introduce attendees to the latest version of the DSpace-CRIS module, covering its functional and technical aspects.
DSpace-CRIS is an additional open-source module for the DSpace platform. It extends the DSpace data model providing the ability to manage, collect and expose data about any entities of the research domain, such as people, organizational units, projects, grants, awards, patents, publications, and so on. Before OR2015 a new version of the system will be released to follow the new DSpace 5.0 version. The new version contains, among other things, important enhancements of its integration with ORCID.
The DSpace-CRIS extensible data model will be explained in depth, through examples and discussion with participants.
Other main topics are DSpace-CRIS "components", management of relationships and network analysis functionalities.
At the end of the workshop, participants will be able to:
- understand the DSpace-CRIS data model
- evaluate if DSpace-CRIS fits the requirements of their institution
- use the DSpace-CRIS User Interface
- change the default configuration, adapting it to a specific data model.
Preseted at OR2017 - Brisbane
Panel Discussion: COAR Next Generation Repositories: Results and Recommendations
The presentation focus on the recommended technologies to implement in Repository platforms
The nearly ubiquitous deployment of repository systems in higher education and research institutions provides the foundation for a distributed, globally networked infrastructure for scholarly communication. However, repository platforms are still using technologies and protocols designed almost twenty years ago, before the boom of the Web and the dominance of Google, social networking, semantic web and ubiquitous mobile devices.
To that end, in April 2016, COAR launched a working group to identify the technologies and architectures of the next generation of repositories. There are two threads to our work: (1) increase the exposure by repositories of uniform behaviors that can be used by machine agents to fuel novel scholarly applications that reach beyond the scope of a single repository and that enable to smoothly embed repository content into mainstream web applications. (2) integrate with existing scholarly infrastructures, specifically those aimed at identification, as a means to solidly embed repositories in the overall scholarly communication landscape.
This panel will present the results of the COAR Next Generation Repositories Working Group including our vision, design assumptions, use cases, architectural and technical recommendations, and next steps. The session will also include time for audience discussion and feedback.
DSpace-CRIS: new features and contribution to the DSpace mainstreamAndrea Bollini
The presentation focus on the latest releases of DSpace-CRIS, compatible with DSpace 5 and 6, with new exciting features. Particularly interesting is the recent integration between DSpace-CRIS and CKAN released as an independent module. The DSpace-CKAN Integration Module has already been released in open source (same license than DSpace) and it can easily adopted also by standard DSpace installations, both JSPUI or XMLUI.
Starting with DSpace-CRIS 5.6.1, along with the security fixes of DSpace JSPUI 5.6, the following features have been introduced: an extendible UI to deliver the bitstreams with dedicated viewers, a simple metadata editing of any DSpace object; the editing of archived items using the submission UI; a deduplication and duplicate-alert tool; improved ORCiD synchronization; improved submission form; improved security model for CRIS entities; creation of CRIS object as part of the submission process, automatic calculation of metrics; advanced import framework; on-demand DOI registration; template services.
DSpace-CKAN Integration Module allows users to directly preview the dataset content deposited in a CKAN instance from DSpace via a “curation task”. DSpace-CRIS and DSpace-CKAN will be supported by 4Science also for the future major versions of the platform and the roadmap to the DSpace 7 compatibility will be also presented.
Here I motivate the need for improving the ways in which we access the integration favorable space of Linked Data, by bridging the gap between various Linked Data querying methods, getting links to these queries, and providing RESTful APIs based on them.
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
Slides of my keynote at the CLARIAH Toogdag 2018 on 9 March at the National Library of the Netherlands. The main topics were the development of the distributed digital heritage network and the alignment to and cooperation with the CLARIAH infrastructure and data. It also points at some of the current limitations of the semantic web technology.
The document outlines the vision, mission, and strategy of the STFC (Science and Technology Facilities Council) in implementing e-Science technologies. The goals are to exploit data from STFC facilities through innovative infrastructure, integrate activities nationally and internationally, and improve computation and data management capabilities to enable new scientific discoveries.
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...Andrea Bollini
The idea of DSpace-CRIS has its origin in 2009 when the Hong Kong University decided to extend the information exposed in their DSpace IR adding information (people/projects) coming from other systems already in use (mainly) for administrative purpose: a CRIS.
One year ago, November 2012, DSpace-CRIS was released as an open source solution to enrich DSpace (1.8.2). After highlighting the important steps made by the DSpace Community in 2013, that will bring to the final release of DSpace 4.0 in December, Cineca focused its presentation on what DSpace-CRIS is today.
The most important announcement was that DSpace-CRIS is now compatible and compliant with the CERIF standard and that an export feature in CERIF XML will be available in the DSpace-CRIS 4.0 version. Indeed the key components of the CERIF data model are supported natively: UUID, timestamped relation, semantic characterization.
In addition to that, the dynamic, flexible and not hardcoded approach of DSpace-CRIS data model makes it very easy to create new entities (besides the few predefined ones) and configure instances compliant with CERIF.
There are several advantages that DSpace-CRIS brings to Institutional Repositories and to the DSpace community overall:
- CRIS entities as authority for Item metadata values;
- DSpace Items can be linked and displayed in the detail page of any CRIS entities;
- Ability to display selected publications (or any other related entities) in the researcher profile;
- It is possible to create lists of selected publications (or any other related entities);
- CRIS entity detailed page visit;
- Global & Top related CERIF Entity views & downloads referencing the CRIS entity (projects for researchers, researchers for OrgUnits, etc.);
- Global & Top item views & downloads referencing the CRIS entity;
- email and RSS alerts;
- Article level metrics for PubMed (extensible):
- Cited-by count in the item page
- Number of articles for researcher
- Total citations for researcher (only items in local DSpace database will be counted)
This document discusses APIs and the API economy. It defines an API as an interface between software systems that allows them to interact. APIs provide programmatic access to systems and processes within organizations. Building APIs improves digital ecosystems by enabling data sharing, reuse, and rapid prototyping. The document advocates for an open innovation approach where organizations use both internal and external knowledge through APIs to accelerate innovation. It presents a vision of APIs managing complex systems and data as products that fuel collaboration across communities.
This document provides an overview of the ResourceSync framework for synchronizing web resources between a source and destinations. It describes the key capabilities a source can provide, including describing available content through resource lists and dumps, describing changes through change lists and dumps, and archiving capability documents. Destinations need baseline and incremental synchronization, and the ability to audit synchronization status. Use cases demonstrate the need for high-volume, low-latency synchronization between sources like arXiv and DBpedia. The framework supports modular capabilities that destinations can use selectively for efficient synchronization aligned with web standards.
Repositories are systems mainly used to store and publish academic contents. This presentation discusses why repositories contents should be published as Linked (Open) Data and how repositories can be extended to do so.
This presentation as been used to start the pilot phase of the OpenAIRE Advance' funded implementation project in DSpace-CRIS.
DSpace-CRIS now provide support for the OpenAIRE guidelines for CRIS manager in addition to the previous already supported guidelines for Literature Repository and DataArchive
Nanopublications and Decentralized PublishingTobias Kuhn
1) Current methods of publishing and sharing research results and data pose problems regarding verifiability, immutability, and permanence over time.
2) Nanopublications use cryptographic hashes to create "Trusty URIs" that make digital objects verifiable, immutable, and permanent by linking identifiers to content.
3) A decentralized network of nanopublication servers allows for open, real-time publishing and retrieval of nanopublications without a central authority through propagation across nodes.
This document provides an introduction and overview of the ORCID API. It discusses the ORCID data model, security model, XML structure, and how to access and use the API. The ORCID API uses OAuth2 for authorization and allows developers to retrieve, update, and add researcher profiles and activities like publications, affiliations, and funding through API calls. Documentation and support resources are also referenced.
Linked Data Notifications Distributed Update Notification and Propagation on ...Aksw Group
Distributed Update Notification and Propagation on the Web of Data with: Linked Data Notifications, PubSubHubbub, Semantic Pingback and Structured Feedback
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
The benefits of Linked Data are well known, but the supporting software ecosystem is still somewhat lacking. During this presentation we will look into the approach taken by Joinup: How we start from a formalized ontology, and map this to the Joinup website. We’ll give an overview of the Open Source components that we created for building linked data based CMS applications.
Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
This document discusses the generation of linked data platforms (LDPs) in highly decentralized information ecosystems. It presents a model for automating the generation of LDPs that considers data heterogeneity, hosting constraints, and reusability of LDP designs. The model includes an LDP generation workflow, a design language called LDP-DL to describe LDP designs, and an LDP generation toolkit to implement the workflow. The goal is to facilitate data exploitation for consumers in decentralized environments.
This document discusses the 5 year evolution of Dataverse, an open source data repository platform. It began as a tool for collaborative data curation and sharing within research teams. Over time, features were added like dataset version control, APIs, and integration with other systems. The document outlines challenges around maintenance and sustainability. It also covers efforts to improve Dataverse's interoperability, such as integrating metadata standards and controlled vocabularies, and making datasets FAIR compliant. The goal is to establish Dataverse as a core component of the European Open Science Cloud by improving areas like software quality, integration with tools, and standardization.
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
Peter Haase and Michael Schmidt of fluid Operations AG presented on developing applications using linked open data. They discussed the increasing amount of linked open data available and challenges in building applications that integrate data from different sources and domains. Their Information Workbench platform aims to address these challenges by allowing users to discover, integrate, and customize applications using linked data in a no-code environment. Key components of the platform include virtualized integration of data sources and the vision of accessing linked data as a cloud-based data service.
Linked Data allows evolving the web into a global data space by publishing structured data on the web using RDF and by linking data items across different data sources. It follows the Linked Data principles of using URIs to identify things and HTTP URIs to look up those names, providing useful RDF information when URIs are dereferenced, and including RDF links to discover related data. The amount of published Linked Data on the web has grown enormously since 2007. Large data sources like DBpedia extract structured data from Wikipedia and act as hubs by interlinking different data sets, enabling new applications and search over integrated data.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Building COVID-19 Museum as Open Science Projectvty
This document discusses building a COVID-19 Museum as an open science project. It describes the speaker's background working on various data management projects. It discusses moving towards open science and sharing data according to FAIR principles. It outlines the Time Machine project for digitizing historical documents and its approach to data management. The rest of the document discusses using the Dataverse platform to build repositories, linking metadata to ontologies, using tools like Weblate for translations, and exploring the use of artificial intelligence and machine learning to enhance metadata and facilitate human-in-the-loop review processes.
This presentation addresses the main issues of Linked Data and scalability. In particular, it provides gives details on approaches and technologies for clustering, distributing, sharing, and caching data. Furthermore, it addresses the means for publishing data trough could deployment and the relationship between Big Data and Linked Data, exploring how some of the solutions can be transferred in the context of Linked Data.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
The document discusses NoSQL technologies including Cassandra, MongoDB, and ElasticSearch. It provides an overview of each technology, describing their data models, key features, and comparing them. Example documents and queries are shown for MongoDB and ElasticSearch. Popular use cases for each are also listed.
Linked Data, the Semantic Web, and You discusses key concepts related to Linked Data and the Semantic Web. It defines Linked Data as a set of best practices for publishing and connecting structured data on the web using URIs, HTTP, RDF, and other standards. It also explains semantic web technologies like RDF, ontologies, SKOS, and SPARQL that enable representing and querying structured data on the web. Finally, it discusses how libraries are applying these concepts through projects like BIBFRAME, FAST, library linked data platforms, and the LD4L project to represent bibliographic data as linked open data.
Decentralised identifiers for CLARIAH infrastructure vty
Slides of the presentation for CLARIAH community on the ideas how to make controlled vocabularies sustainable and FAIR (Findable, Accessible, Interoperable, Reusable) with the help of Decentralized Identifiers (DIDs).
Dataverse repository for research data in the COVID-19 Museumvty
The Covid-19 Museum has an ambition to create a platform to deposit, consult, aggregate and study heterogeneous data about the pandemics using features of a distributed web service. To achieve this purpose, Dataverse has been selected as a reliable FAIR data repository with built-in search engine and functionality that allows adding computing resources to explore archived resources both on data and metadata. Presentation by
Slava Tykhonov, DANS-KNAW (The Royal Netherlands Academy of Arts and Sciences). Université Paris Cité, 19 April 2022.
Building collaborative Machine Learning platform for Dataverse network. Lecture by Slava Tykhonov (DANS-KNAW, the Netherlands), DANS seminar series, 29.03.2022
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
Presentation at ISKO Knowledge Organisation Research Observatory. RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
The presentation for the W3C Semantic Web in Health Care and Life Sciences community group by Slava Tykhonov, DANS-KNAW, the Royal Netherlands Academy of Arts and Sciences (October 2020). The recording is available https://www.youtube.com/watch?v=G9oiyNM_RHc
CLARIN CMDI use case and flexible metadata schemes vty
Presentation for CLARIAH IG Linked Open Data on the latest developments for Dataverse FAIR data repository. Building SEMAF workflow with external controlled vocabularies support and Semantic API. Using the theory of inventive problem solving TRIZ for the further innovation in Linked Data.
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
The development of the Common Framework in Dataverse and the CMDI use case. Building AI/ML based workflow for the prediction and linking concepts from external controlled vocabularies to the CMDI metadata values.
Controlled vocabularies and ontologies in Dataverse data repositoryvty
This document discusses supporting external controlled vocabularies in Dataverse. It proposes implementing a JavaScript interface to allow linking metadata fields to terms from external vocabularies accessed via SKOSMOS APIs. Several challenges are identified, such as applying support to any field, backward compatibility, and ensuring vocabularies come from authoritative sources. Caching concepts and linking dataset files directly to terms are also proposed to improve interoperability.
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
This document summarizes an presentation about automating CI/CD testing, installation, and deployment of Dataverse in the European Open Science Cloud. It discusses using Docker and Kubernetes for deployment, a community-driven QA plan using pyDataverse for test automation, and providing quality assurance as a service. The presentation also covers topics like the CESSDA maturity model, integrating Dataverse on Google Cloud, and using serverless computing for some Dataverse applications and services.
External controlled vocabularies support in Dataversevty
This presentation discusses adding support for external controlled vocabularies to the Dataverse data repository platform. It describes how ontologies like SKOS can be used to represent vocabularies and allow linking metadata fields in Dataverse to terms. The presentation proposes developing a Semantic Gateway plugin for Dataverse that would allow browsing and linking to external vocabularies hosted in the SKOSMOS framework via its API. This could improve metadata by allowing standardized, linked terms and help make data more FAIR.
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
This presentation is about external CVs support in Dataverse, Open Source data repository. Data Archiving and Networked Services (DANS-KNAW) decided to use Dataverse as a basic technology to build Data Stations and provide FAIR data services for various Dutch research communities.
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
Dataverse can be deployed using Docker containers to improve maintainability and portability. The document discusses how Docker can isolate applications and their dependencies into portable containers. It provides an example of deploying Dataverse as a set of microservices within Docker containers. Instructions are included on building Docker images, running containers, and managing the containers and images through commands and tools like Docker Desktop, Docker Hub, and Docker Compose.
Technical integration of data repositories status and challengesvty
This document discusses technical integration of data repositories, including:
- Previous integration initiatives focused on metadata integration using OAI-PMH and ResourceSync protocols, as well as aggregators like OpenAIRE.
- Challenges to integration include different levels of software/service maturity, maintenance of distributed applications, and use of common standards and vocabularies.
- Potential integration efforts could focus on improving FAIRness, metadata/data flexibility, and connections between repositories, software, and computing resources to better enable reuse of EOSC data and services.
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆Sérgio Sacani
Context. The early-type galaxy SDSS J133519.91+072807.4 (hereafter SDSS1335+0728), which had exhibited no prior optical variations during the preceding two decades, began showing significant nuclear variability in the Zwicky Transient Facility (ZTF) alert stream from December 2019 (as ZTF19acnskyy). This variability behaviour, coupled with the host-galaxy properties, suggests that SDSS1335+0728 hosts a ∼ 106M⊙ black hole (BH) that is currently in the process of ‘turning on’. Aims. We present a multi-wavelength photometric analysis and spectroscopic follow-up performed with the aim of better understanding the origin of the nuclear variations detected in SDSS1335+0728. Methods. We used archival photometry (from WISE, 2MASS, SDSS, GALEX, eROSITA) and spectroscopic data (from SDSS and LAMOST) to study the state of SDSS1335+0728 prior to December 2019, and new observations from Swift, SOAR/Goodman, VLT/X-shooter, and Keck/LRIS taken after its turn-on to characterise its current state. We analysed the variability of SDSS1335+0728 in the X-ray/UV/optical/mid-infrared range, modelled its spectral energy distribution prior to and after December 2019, and studied the evolution of its UV/optical spectra. Results. From our multi-wavelength photometric analysis, we find that: (a) since 2021, the UV flux (from Swift/UVOT observations) is four times brighter than the flux reported by GALEX in 2004; (b) since June 2022, the mid-infrared flux has risen more than two times, and the W1−W2 WISE colour has become redder; and (c) since February 2024, the source has begun showing X-ray emission. From our spectroscopic follow-up, we see that (i) the narrow emission line ratios are now consistent with a more energetic ionising continuum; (ii) broad emission lines are not detected; and (iii) the [OIII] line increased its flux ∼ 3.6 years after the first ZTF alert, which implies a relatively compact narrow-line-emitting region. Conclusions. We conclude that the variations observed in SDSS1335+0728 could be either explained by a ∼ 106M⊙ AGN that is just turning on or by an exotic tidal disruption event (TDE). If the former is true, SDSS1335+0728 is one of the strongest cases of an AGNobserved in the process of activating. If the latter were found to be the case, it would correspond to the longest and faintest TDE ever observed (or another class of still unknown nuclear transient). Future observations of SDSS1335+0728 are crucial to further understand its behaviour. Key words. galaxies: active– accretion, accretion discs– galaxies: individual: SDSS J133519.91+072807.4
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...Scintica Instrumentation
Targeting Hsp90 and its pathogen Orthologs with Tethered Inhibitors as a Diagnostic and Therapeutic Strategy for cancer and infectious diseases with Dr. Timothy Haystead.
Mechanisms and Applications of Antiviral Neutralizing Antibodies - Creative B...Creative-Biolabs
Neutralizing antibodies, pivotal in immune defense, specifically bind and inhibit viral pathogens, thereby playing a crucial role in protecting against and mitigating infectious diseases. In this slide, we will introduce what antibodies and neutralizing antibodies are, the production and regulation of neutralizing antibodies, their mechanisms of action, classification and applications, as well as the challenges they face.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...Sérgio Sacani
We present the JWST discovery of SN 2023adsy, a transient object located in a host galaxy JADES-GS
+
53.13485
−
27.82088
with a host spectroscopic redshift of
2.903
±
0.007
. The transient was identified in deep James Webb Space Telescope (JWST)/NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) program. Photometric and spectroscopic followup with NIRCam and NIRSpec, respectively, confirm the redshift and yield UV-NIR light-curve, NIR color, and spectroscopic information all consistent with a Type Ia classification. Despite its classification as a likely SN Ia, SN 2023adsy is both fairly red (
�
(
�
−
�
)
∼
0.9
) despite a host galaxy with low-extinction and has a high Ca II velocity (
19
,
000
±
2
,
000
km/s) compared to the general population of SNe Ia. While these characteristics are consistent with some Ca-rich SNe Ia, particularly SN 2016hnk, SN 2023adsy is intrinsically brighter than the low-
�
Ca-rich population. Although such an object is too red for any low-
�
cosmological sample, we apply a fiducial standardization approach to SN 2023adsy and find that the SN 2023adsy luminosity distance measurement is in excellent agreement (
≲
1
�
) with
Λ
CDM. Therefore unlike low-
�
Ca-rich SNe Ia, SN 2023adsy is standardizable and gives no indication that SN Ia standardized luminosities change significantly with redshift. A larger sample of distant SNe Ia is required to determine if SN Ia population characteristics at high-
�
truly diverge from their low-
�
counterparts, and to confirm that standardized luminosities nevertheless remain constant with redshift.
PPT on Alternate Wetting and Drying presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Linked Open Data and DANS
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
Linked Open Data and DANS
Reinier de Valk
reinier.de.valk@dans.knaw.nl
Vyacheslav Tykhonov
vyacheslav.tykhonov@dans.knaw.nl
NOTaS meeting, The Hague, 15.12.2017
2. LOD | Linked (Open) Data?
• Linked Data (LD) is “a method of publishing structured data so that it
can be interlinked and become more useful through semantic queries” [1]
• Linked Open Data (LOD) is LD that is open, i.e., freely availably to use
and republish
• Builds upon standard web technologies, but extends them so that they
can be read by machines
• Semantic web: a web of data that can be processed by machines
[1] https://en.wikipedia.org/wiki/Linked_data
3. LOD | Four principles of LD [2]
• Use uniform resource identifiers (URIs) as names for things
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
• Include links to other URIs, so that they can discover more things
[2] Berners-Lee, T. (2006) Linked data. https://www.w3.org/DesignIssues/LinkedData.html
4. LOD | Building block: the triple
• The basic building block of LD is the semantic triple (or simply triple)
• a triple is a statement in the form subject-predicate-object
http://example.name#Bob http://purl.org/vocab/relationship/childOf
http://example.name#Alice
http://example.name#Carl http://purl.org/vocab/relationship/childOf
http://example.name#Alice
• Triples are stored in triplestores (purpose-built databases) or graph
databases (databases with a more generalised structure)
• These databases can be queried with query languages such as
SPARQL; this is done using a (SPARQL) endpoint
5. LOD | The LOD cloud (22.08.2017) [3]
[3] http://lod-cloud.net/
6. LOD at DANS | Static LOD
• A LOD graph is living – it keeps evolving
• We archive static snapshots of the graph
• LD is in plain ASCII – no complicated formats needed
• The archived static snapshot can be revived – the README file
accompanying the data describes the procedure
• Examples at EASY, DANS online long-term archiving system [4]
• use search term “linked data”
• interesting examples: LOD Laundromat; CEDAR RDF database
[4] http://www.easy.dans.knaw.nl/
8. DANS LOD infrastructure
• LOD conversion tool harvesting public metadata from DANS systems
using OAI-PMH protocol and converting to Turtle RDF format
• Virtuoso with SPARQL endpoint to store and query archived triples
(static)
• grlc to build Web APIs using shared SPARQL queries
• Timbuctoo Linked Data storage to keep different versions of
metadata harvested from DANS systems (tern into schema)
• GraphQL endpoint integrated in Timbuctoo to query repository and
evaluate new links
9. What is Timbuctoo?
• Timbuctoo is an open source Linked Data repository system developed by Huygens ING
and specialized in handling interpretative and heterogeneous content. Timbuctoo is
specifically designed for academic research in the arts & humanities and is ideally suited
for research institutions, libraries and archives supporting scholars who follow a
hermeneutic methodology.
• Data upload options:
• Excel upload
• CSV upload
• Dataperfect upload
• remote repository upload with ResourceSync
10. Description of pipeline to archive
• Users depositing new datasets, metadata updating in time
• Snapshots are taken regularly
• ResourceSync is the only option to get updated snapshot in LOD
cloud without manual interaction
14. What is GraphQL?
• “GraphQL is a data query language developed internally by Facebook in 2012 before
being publicly released in 2015. It provides an alternative to REST and ad -hoc webservice
architectures.”
• Wikipedia
• "GraphQL is a query language for your API, and a server-side runtime for executing
queries by using a type system you define for your data. GraphQL isn't tied to any specific
database or storage engine and is instead backed by your existing code and data.”
• GraphQL endpoint provided by Timbuctoo RDF storage allows visual Linked Data
exploration.
•
17. N-Quads U.D.
• RDF data set notations are like snapshots.
• We enrich them…
• What if we need to track changes in resulting new RDF file?
• How do we know which of these predicates has had a previous value?
• What if we want to add new triples?
• N-Quads itself is an extension on N-Triples, Timbuctoo supports both:
• --- easy.nq 2017-12-14 11:18:16.057104790 +0200
• +++ empty.nq 2017-12–14 12:08:18.772264550 +0200
• @@ -1,35652 +0,0 @@
• +<easy:15960> <dc:location> "http://www.gemeentegeschiedenis.nl/gemeentenaam/Slochteren" .
• +<easy:15960> <dc:location> "http://www.gemeentegeschiedenis.nl/gemeentenaam/Sloten_NH" .
•