Presentation by Bernd Pulverer on EMBO's 'Source Data' and the next generation of open access given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
The document discusses the ISA infrastructure, which provides a framework for tracking metadata in bioscience experiments from data collection to sharing in linked data clouds. The infrastructure includes a metadata syntax, open source software tools, and a user community. It allows annotation of experimental metadata, materials, and processes using ontologies to make semantics explicit and enable integration and knowledge discovery. The infrastructure is growing with over 30 public and private resources adopting it to facilitate standards-compliant sharing of investigations across life science domains.
re3data - Registry of Research Data RepositoriesHeinz Pampel
The document discusses re3data, a global registry of research data repositories. It provides background on funder and publisher data policies driving the need for data sharing. The registry aims to help researchers, funders and others find appropriate data repositories across disciplines. It covers over 1,300 repositories described using a metadata schema. Analysis of the registered repositories shows most are disciplinary focused and contain environmental and geoscience data which is mostly openly accessible. The registry is governed through an international board and technical developments include open APIs and integration with DataCite.
Bio-GraphIIn is a graph-based, integrative and semantically enabled repository for life science experimental data. It addresses the need for a system that supports retrospective data submissions, handles heterogeneous experimental data, and overcomes the fragmentation of existing data formats and databases. Bio-GraphIIn uses the Investigation/Study/Assay (ISA) framework and ontologies to semantically represent experimental metadata and enable rich queries across studies, with the goal of facilitating integrative data analysis.
This document discusses improving reproducibility and transparency in scholarly publishing by rewarding open sharing of data, methods, and results. It presents a case study using the ISA framework to describe experimental data and workflows, nanopublications to make assertions from results, and research objects to aggregate these scholarly artifacts. Combining these approaches could help address issues like increasing retractions by incentivizing open review, replication of analyses, and credit for sharing diverse scholarly contributions beyond publications. The document advocates extending this case study to provide community guidelines for integrated use of these models.
BioAssay Research Database Presentation at the Chem Axon UGM 2013Andrea de Souza
The BARD platform aims to enable researchers to effectively use data from the Molecular Libraries Program (MLP) to generate new hypotheses. It provides tools for assay registration, querying and visualization of over 4,000 assays, 35 million compounds and 300 projects. These include intuitive guided queries, cross-assay views, and predictive models. The platform structures data using an Assay Definition Standard to integrate ontologies and represent assay metadata, results and experiments. This supports knowledge discovery and hypothesis generation from diverse chemical biology datasets.
Presentation by Bernd Pulverer on EMBO's 'Source Data' and the next generation of open access given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
The document discusses the ISA infrastructure, which provides a framework for tracking metadata in bioscience experiments from data collection to sharing in linked data clouds. The infrastructure includes a metadata syntax, open source software tools, and a user community. It allows annotation of experimental metadata, materials, and processes using ontologies to make semantics explicit and enable integration and knowledge discovery. The infrastructure is growing with over 30 public and private resources adopting it to facilitate standards-compliant sharing of investigations across life science domains.
re3data - Registry of Research Data RepositoriesHeinz Pampel
The document discusses re3data, a global registry of research data repositories. It provides background on funder and publisher data policies driving the need for data sharing. The registry aims to help researchers, funders and others find appropriate data repositories across disciplines. It covers over 1,300 repositories described using a metadata schema. Analysis of the registered repositories shows most are disciplinary focused and contain environmental and geoscience data which is mostly openly accessible. The registry is governed through an international board and technical developments include open APIs and integration with DataCite.
Bio-GraphIIn is a graph-based, integrative and semantically enabled repository for life science experimental data. It addresses the need for a system that supports retrospective data submissions, handles heterogeneous experimental data, and overcomes the fragmentation of existing data formats and databases. Bio-GraphIIn uses the Investigation/Study/Assay (ISA) framework and ontologies to semantically represent experimental metadata and enable rich queries across studies, with the goal of facilitating integrative data analysis.
This document discusses improving reproducibility and transparency in scholarly publishing by rewarding open sharing of data, methods, and results. It presents a case study using the ISA framework to describe experimental data and workflows, nanopublications to make assertions from results, and research objects to aggregate these scholarly artifacts. Combining these approaches could help address issues like increasing retractions by incentivizing open review, replication of analyses, and credit for sharing diverse scholarly contributions beyond publications. The document advocates extending this case study to provide community guidelines for integrated use of these models.
BioAssay Research Database Presentation at the Chem Axon UGM 2013Andrea de Souza
The BARD platform aims to enable researchers to effectively use data from the Molecular Libraries Program (MLP) to generate new hypotheses. It provides tools for assay registration, querying and visualization of over 4,000 assays, 35 million compounds and 300 projects. These include intuitive guided queries, cross-assay views, and predictive models. The platform structures data using an Assay Definition Standard to integrate ontologies and represent assay metadata, results and experiments. This supports knowledge discovery and hypothesis generation from diverse chemical biology datasets.
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
Written and presented by Carole Goble (University of Manchester) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
The document discusses the ISA (Investigation/Study/Assay) framework for enabling data reuse and reproducibility in bioscience research. The ISA framework provides a generic format for rich experimental descriptions and an infrastructure of open source software tools. It aims to minimize the burden of reporting, curating, sharing data and metadata from bioscience experiments to enable comprehension, reuse of data, and reproducibility. The framework promotes community engagement to develop community standards and document use cases.
This document summarizes a presentation about publishing research data with Scientific Data. It discusses the benefits of sharing research data, including generating more analyses and reuse. It outlines Scientific Data's process for publishing Data Descriptors, which include both human-readable articles and machine-readable metadata. Data Descriptors can be published at any point in the research process. The presentation notes that Data Descriptors provide credit for data generators, enable discovery and reuse of data, and have resulted in data being cited and reused in different fields and by the public.
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Using Open Science to advance science - advancing open data Robert Oostenveld
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
Data collection is the process of systematically gathering information to answer research questions. Accurate data collection is essential to maintaining research integrity. Issues that can compromise integrity include errors in data collection instruments or procedures. Quality assurance and quality control help ensure integrity. Quality assurance occurs before data collection through standardized protocols and manuals. Quality control occurs during and after collection through review and validation of data. Maintaining integrity supports accurate conclusions and prevents wasted resources.
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
The document discusses the vision of a "connected digital research enterprise" where researchers can more easily find and collaborate with others based on shared data and outputs. It describes a scenario where Researcher X discovers commonalities in data with Researcher Y, views Y's datasets and publications, and initiates a collaboration. Their joint work is captured and indexed, and a company utilizes some of the outputs while providing funding back to the researchers. The vision aims to more closely connect scientific work through shared digital resources.
This document presents a proposal for an individual project on opportunistic persistent data storage. The proposal discusses opportunistic networks and their social network properties. It aims to efficiently implement persistent data storage in opportunistic networks by utilizing social network properties. Key research questions are how to implement efficient storage protocols using social properties and handling replicas with low overhead. The study will select server nodes based on social properties and implement a storage model with write and read quorums. Evaluation will use the ONE simulator and synthetic traces to analyze the approach.
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
This document summarizes Alejandra González-Beltrán's presentation on BioSharing.org, which maps the landscape of community standards, databases, and data policies. It discusses how BioSharing aims to help stakeholders make informed decisions for data interoperability through curating crowdsourcing information on existing standards and policies. It also describes how BioSharing integrates information from the MIBBI Project and is working with the MICheckout tool to help users create and use modular standards components.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1Bruce Kozuma
The document discusses establishing a common cell line metadata registry at the Broad Institute to facilitate collaboration. It proposes using an institutional database as the canonical source, and ingesting data into local systems to link project-specific information to parental cell line data. This would create a shared registry of parental cell lines available to all groups, along with project-specific daughter cell lines. The goals are to standardize metadata, enable discovery of related work, and accelerate research progress.
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Research data management (RDM) and the FAIR principles (Findable, Accessible, Interoperable, Reusable) are widely
promoted as basis for a shared research data infrastructure. Nevertheless, researchers involved in next generation
sequencing (NGS) still lack adequate RDM solutions. The NGS metadata is generally not stored together with the raw
NGS data, but kept by individual researchers in separate files. This situation complicates RDM practice. Moreover,
the (meta)data does often not meet the FAIR principles [6]. Consequently, a central FAIR-compliant repository
is highly desirable to support NGS related research. We have selected iRODS (Rule-Oriented Data management
systems) [3] as a basis for implementing a sequencing data repository because it allows storing both data and metadata
together. iRODS serves as scalable middleware to access different storage facilities in a centralized and virtualized
way, and supports different types of clients. This repository will be part of an ecosystem of RDM solutions that
cover complementary phases of the research data life cycle in our organization (Academic Medical Center of the
University of Amsterdam). We selected Virtuoso [5] to enrich the metadata from iRODS to enable the management
of a triplestore for linked data. The metadata in the iCat (iRODS’ metadata catalogue) and the ontology in Virtuoso
are kept synchronized by enforcement of strict data manipulation policies. We have implemented a prototype to
preserve raw sequencing data for one research group. Three iRODS client interfaces are used for different purposes:
Davrods [4] for data and metadata ingestion, data retrieval; Metalnx-web [7] for administration, data curation, and
repository browsing; and iCommands [2] for all tasks by advanced users. Different user profiles are defined (principal
investigator, data curator, repository administrator), with different access rights. New data is ingested by copying raw
sequence files and the corresponding metadata file (a sample sheet) to the landing collection on iRODS. An iRODS
rule is triggered by the sample sheet file, which extracts the metadata and registers it to the iCAT as AVU (Attribute,
Value and Unit). Ontology files are registered into Virtuoso. The sequence files are copied to the persistent collection
and are made uniquely identifiable based on metadata. All the steps are recorded into a report file that enables
monitoring and tracking of progress and faults. Here we describe the design and implementation of the prototype,
and discuss the first assessment results. Initial results indicate that the proposed solution is acceptable and fits the
researchers workflow well.
Doing for Data what Pubmed did for literature: DATS a model for dataset description datasets indexing and data discovery.
Googleslides [https://goo.gl/cd5KKa] or Slideshare [https://goo.gl/c8DH5N]
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
The document discusses using metadata to find researchers within and across organizations. It provides an example of analyzing data from the CiNii and KAKEN databases to find collaborators of researchers at the National Institute of Informatics in Japan. Network analysis was performed and revealed 61 researchers with 1,832 collaborators based on CiNii data and 37 researchers with 421 collaborators based on KAKEN data. The analysis also examined collaboration networks within the Graduate University for Advanced Studies, which includes researchers from diverse domains across its 21 departments. The document emphasizes that while the data provides opportunities to explore collaboration, making services to easily support researchers remains important.
Impact of climate change policy on the National Electricity MarketEngineers Australia
The document discusses the changing electricity system in Australia and the challenges this poses for power system operations. Major drivers of change include policies around renewable energy targets and carbon pricing. This will result in new types of power plants, possible early retirements of coal plants, and changes to demand patterns. The operator, AEMO, is enhancing its operations planning process to look ahead two years and consider scenarios around potential changes in generation and demand. This aims to help AEMO identify trends and ensure operational practices can effectively manage the changing power system.
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
Written and presented by Carole Goble (University of Manchester) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
The document discusses the ISA (Investigation/Study/Assay) framework for enabling data reuse and reproducibility in bioscience research. The ISA framework provides a generic format for rich experimental descriptions and an infrastructure of open source software tools. It aims to minimize the burden of reporting, curating, sharing data and metadata from bioscience experiments to enable comprehension, reuse of data, and reproducibility. The framework promotes community engagement to develop community standards and document use cases.
This document summarizes a presentation about publishing research data with Scientific Data. It discusses the benefits of sharing research data, including generating more analyses and reuse. It outlines Scientific Data's process for publishing Data Descriptors, which include both human-readable articles and machine-readable metadata. Data Descriptors can be published at any point in the research process. The presentation notes that Data Descriptors provide credit for data generators, enable discovery and reuse of data, and have resulted in data being cited and reused in different fields and by the public.
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
Amit Sheth with TK Prasad, "Semantic Technologies for Big Science and Astrophysics", Invited Plenary Presentation, at Earthcube Solar-Terrestrial End-User Workshop, NJIT, Newark, NJ, August 13, 2014.
Like many other fields of Big Science, Astrophysics and Solar Physics deal with the challenges of Big Data, including Volume, Variety, Velocity, and Veracity. There is already significant work on handling volume related challenges, including the use of high performance computing. In this talk, we will mainly focus on other challenges from the perspective of collaborative sharing and reuse of broad variety of data created by multiple stakeholders, large and small, along with tools that offer semantic variants of search, browsing, integration and discovery capabilities. We will borrow examples of tools and capabilities from state of the art work in supporting physicists (including astrophysicists) [1], life sciences [2], material sciences [3], and describe the role of semantics and semantic technologies that make these capabilities possible or easier to realize. This applied and practice oriented talk will complement more vision oriented counterparts [4].
[1] Science Web-based Interactive Semantic Environment: http://sciencewise.info/
[2] NCBO Bioportal: http://bioportal.bioontology.org/ , Kno.e.sis’s work on Semantic Web for Healthcare and Life Sciences: http://knoesis.org/amit/hcls
[3] MaterialWays (a Materials Genome Initiative related project): http://wiki.knoesis.org/index.php/MaterialWays
[4] From Big Data to Smart Data: http://wiki.knoesis.org/index.php/Smart_Data
Citing data in research articles: principles, implementation, challenges - an...FAIRDOM
Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Using Open Science to advance science - advancing open data Robert Oostenveld
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
Data collection is the process of systematically gathering information to answer research questions. Accurate data collection is essential to maintaining research integrity. Issues that can compromise integrity include errors in data collection instruments or procedures. Quality assurance and quality control help ensure integrity. Quality assurance occurs before data collection through standardized protocols and manuals. Quality control occurs during and after collection through review and validation of data. Maintaining integrity supports accurate conclusions and prevents wasted resources.
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
The document discusses the vision of a "connected digital research enterprise" where researchers can more easily find and collaborate with others based on shared data and outputs. It describes a scenario where Researcher X discovers commonalities in data with Researcher Y, views Y's datasets and publications, and initiates a collaboration. Their joint work is captured and indexed, and a company utilizes some of the outputs while providing funding back to the researchers. The vision aims to more closely connect scientific work through shared digital resources.
This document presents a proposal for an individual project on opportunistic persistent data storage. The proposal discusses opportunistic networks and their social network properties. It aims to efficiently implement persistent data storage in opportunistic networks by utilizing social network properties. Key research questions are how to implement efficient storage protocols using social properties and handling replicas with low overhead. The study will select server nodes based on social properties and implement a storage model with write and read quorums. Evaluation will use the ONE simulator and synthetic traces to analyze the approach.
BioSharing.org - mapping the landscape of community standards, databases, dat...Alejandra Gonzalez-Beltran
This document summarizes Alejandra González-Beltrán's presentation on BioSharing.org, which maps the landscape of community standards, databases, and data policies. It discusses how BioSharing aims to help stakeholders make informed decisions for data interoperability through curating crowdsourcing information on existing standards and policies. It also describes how BioSharing integrates information from the MIBBI Project and is working with the MICheckout tool to help users create and use modular standards components.
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
A workshop as part of the International Digital Curation Conference 2016 on DMP development and support. This presentation demonstrates how we can use data management plans as a source of information to better understand researcher data stewardship practices and how to support them. Be sure to see the slide notes to better understand the presentation (most slides are just photos/icons).
2016 Bio-IT World Cell Line Coordination Poster 2016-04-05v1Bruce Kozuma
The document discusses establishing a common cell line metadata registry at the Broad Institute to facilitate collaboration. It proposes using an institutional database as the canonical source, and ingesting data into local systems to link project-specific information to parental cell line data. This would create a shared registry of parental cell lines available to all groups, along with project-specific daughter cell lines. The goals are to standardize metadata, enable discovery of related work, and accelerate research progress.
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
Research data management (RDM) and the FAIR principles (Findable, Accessible, Interoperable, Reusable) are widely
promoted as basis for a shared research data infrastructure. Nevertheless, researchers involved in next generation
sequencing (NGS) still lack adequate RDM solutions. The NGS metadata is generally not stored together with the raw
NGS data, but kept by individual researchers in separate files. This situation complicates RDM practice. Moreover,
the (meta)data does often not meet the FAIR principles [6]. Consequently, a central FAIR-compliant repository
is highly desirable to support NGS related research. We have selected iRODS (Rule-Oriented Data management
systems) [3] as a basis for implementing a sequencing data repository because it allows storing both data and metadata
together. iRODS serves as scalable middleware to access different storage facilities in a centralized and virtualized
way, and supports different types of clients. This repository will be part of an ecosystem of RDM solutions that
cover complementary phases of the research data life cycle in our organization (Academic Medical Center of the
University of Amsterdam). We selected Virtuoso [5] to enrich the metadata from iRODS to enable the management
of a triplestore for linked data. The metadata in the iCat (iRODS’ metadata catalogue) and the ontology in Virtuoso
are kept synchronized by enforcement of strict data manipulation policies. We have implemented a prototype to
preserve raw sequencing data for one research group. Three iRODS client interfaces are used for different purposes:
Davrods [4] for data and metadata ingestion, data retrieval; Metalnx-web [7] for administration, data curation, and
repository browsing; and iCommands [2] for all tasks by advanced users. Different user profiles are defined (principal
investigator, data curator, repository administrator), with different access rights. New data is ingested by copying raw
sequence files and the corresponding metadata file (a sample sheet) to the landing collection on iRODS. An iRODS
rule is triggered by the sample sheet file, which extracts the metadata and registers it to the iCAT as AVU (Attribute,
Value and Unit). Ontology files are registered into Virtuoso. The sequence files are copied to the persistent collection
and are made uniquely identifiable based on metadata. All the steps are recorded into a report file that enables
monitoring and tracking of progress and faults. Here we describe the design and implementation of the prototype,
and discuss the first assessment results. Initial results indicate that the proposed solution is acceptable and fits the
researchers workflow well.
Doing for Data what Pubmed did for literature: DATS a model for dataset description datasets indexing and data discovery.
Googleslides [https://goo.gl/cd5KKa] or Slideshare [https://goo.gl/c8DH5N]
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
The document discusses using metadata to find researchers within and across organizations. It provides an example of analyzing data from the CiNii and KAKEN databases to find collaborators of researchers at the National Institute of Informatics in Japan. Network analysis was performed and revealed 61 researchers with 1,832 collaborators based on CiNii data and 37 researchers with 421 collaborators based on KAKEN data. The analysis also examined collaboration networks within the Graduate University for Advanced Studies, which includes researchers from diverse domains across its 21 departments. The document emphasizes that while the data provides opportunities to explore collaboration, making services to easily support researchers remains important.
Impact of climate change policy on the National Electricity MarketEngineers Australia
The document discusses the changing electricity system in Australia and the challenges this poses for power system operations. Major drivers of change include policies around renewable energy targets and carbon pricing. This will result in new types of power plants, possible early retirements of coal plants, and changes to demand patterns. The operator, AEMO, is enhancing its operations planning process to look ahead two years and consider scenarios around potential changes in generation and demand. This aims to help AEMO identify trends and ensure operational practices can effectively manage the changing power system.
Sharon Dawes (CTG Albany) Open data quality: a practical viewOpen City Foundation
This document discusses open data quality and focuses on ensuring data is fit for its intended use. It notes that while open data aims to provide easy access, the value depends on the quality and how users apply the data. Quality issues can arise from how data is originally collected and maintained by different government systems. The document recommends open data providers adopt stewardship practices to maintain metadata and ensure quality, while users should approach data cautiously and look for ways to engage in data communities. Overall it promotes openness but also a realistic view of potential quality problems and the need for tools and strategies to maximize data value for various users.
This document discusses getting organizations and websites on the Linked Data web by following Linked Data principles. It provides an overview of Linked Data and its growth over time. The key Linked Data principles are to publish semantic data using RDF, enable linking between data through URIs, and use real URIs for identifying things. Adopting these principles allows data integration and querying across diverse datasets through standards like SPARQL. The document also discusses challenges in applying Linked Data to existing web content and standards like RDFa that embed semantic metadata directly in web pages.
WWW2014 Overview of W3C Linked Data Platform 20140410Arnaud Le Hors
The document summarizes the Linked Data Platform (LDP) being developed by the W3C Linked Data Platform Working Group. It describes the challenges of using Linked Data for application integration today and how the LDP specification aims to address these by defining HTTP-based patterns for creating, reading, updating and deleting Linked Data resources and containers in a standardized, RESTful way. The LDP models resources as HTTP entities that can be manipulated via standard methods and represent their state using RDF, addressing questions around resource management that the original Linked Data principles did not.
What is hot on the web right now - A W3C perspectiveArmin Haller
HTTP and HTML and the Web itself enter their third decade of existence. Still, the Web continues to transform human communication, information sharing, commerce, education, and entertainment. Social networking, cloud computing, and the convergence of Web, television, video and online gaming are among the phenomena stretching the Web in exciting new directions. In this talk, Armin will present what the World Wide Web Consortium (W3C), overlooking and steering the development of new Web standards is up to for the third decade of the Web. The W3C community is building an Open Web Platform that will enable the Web to grow and foster future innovation. This presentation present technology highlights of 2011 for advancing the Web platform. Focus topics of this talk will be the new HTML5 standard, the Data for Web Applications initiative which includes the next generation of RDF, and standards that allow people to create Semantic Web enabled Web Apps that have access to data from a variety of sources, including data-in-documents (RDFa) and data-from-databases (W3C's RDB2RDF).
Not intended as a talk but as a tale, this is a visual journey to the center of government and how government infrastructures keep data away from their real owners: the people, and how to overcome the current situation and empower them.
Read also the story that goes along (in English and Español) at:
http://datos.fundacionctic.org/2009/11/releasing-the-peoples-data/
ה-Mobile Web Application Best Practices
הוא מסמך קווים מנחים של ה-W3C, המועמד להפוך להמלצה רשמית (Candidate Recommendation). במהלך הסדנא נעבור באופן מפורט על 35 הסעיפים המופיעים במסמך, המתארים שיטות לשיפור חווית המשתמש באפליקציות ווב למכשירים ניידים, ומזהיר מפני אלו הנחשבות למזיקות
¿Por qué los servicios electrónicos se usan tan poco? cómo hacer frente a la ...Open Data @ CTIC
En el actual panorama nacional e internacional de administración electrónica el impulso de los servicios electrónicos se ha realizado en base a una premisa: incrementar el número de servicios electrónicos online con el mayor grado de sofisticación posible. Pero esta premisa no ha dado el rendimiento esperado. A pesar de que varias administraciones ya superan los mil servicios online con el máximo grado de sofisticación, el uso de los mismos es bajo y, en algunos casos, preocupantemente bajo. ¿Cuáles son las causas? ¿Cómo hacerles frente? ¿Cómo optimizar la inversión en servicios electrónicos, aumentar el rendimiento y el número de usuarios? En esta ponencia, José Manuel Alonso hace un breve repaso al panorama actual, a las contradictorias asunciones, y da algunas recetas sobre como optimizar las inversiones, reducir los costes y maximizar el impacto de los servicios públicos.
Accessible Design with HTML5 - HTML5DevConf.com May 21st San Francisco, 2012 ...Raj Lal
Learn how to design an HTML5 application which supports people with disabilities, and know why its a good business decision. An accessible web application gives maximum reach to your application's information, functionalities and benefits, by allowing multiple input methods, different interaction models, and customization based on special needs and limited device supports. The four major disabilities that effect user capabilities are visual, hearing, mobility (difficulty in using the mouse), and cognitive disabilities, which are related to learning abilities. Know how to use the latest technologies to accommodate these users in the user interface.
Los organismos públicos y las administraciones publican cada día más datos en la Web. Compartir estos datos facilita lograr una mayor transparencia, permite ofrecer un mejor servicio público, y fomenta un mayor uso y reutilización tanto pública como comercial de los mismos. Algunas administraciones incluso han creado catálogos o portales que los datos sean más fáciles de encontrar y utilizar. Aunque las razones varían de caso en caso, la problemática y logística de publicar dichos datos es la misma.
Para ayudar a las administraciones a abrir y compartir sus datos, el Grupo de Interés sobre eGovernment en W3C ha desarrollado un conjunto de pautas. Estos sencillos pasos enfatizan estándares y métodos que fomentan la publicación de datos del sector público, permitiendo su reutilización de maneras nuevas e innovadoras.
Esta ponencia repasará las pautas anteriormente mencionadas y los modos de publicar datos del sector público, poniendo el énfasis en el mejor método existente: Linked Government Data, así como mostrará casos y aplicaciones reales de la aplicación de dichos métodos en proyectos desarrollados por la Fundación CTIC para diversas administraciones, así como de otras partes del mundo.
O documento introduz os conceitos de semântica, Web Semântica e como ela pode ensinar computadores a entenderem o significado por trás dos recursos da Web através de ontologias, vocabulários controlados e descrições estruturadas dos recursos e suas relações usando RDF, OWL e SPARQL.
Presentation delivered by Ludo Hendrickx and Joris Beek on 11 December 2013 Dutch at the Ministry of Interior, The Hague, The Netherlands. More information on: https://joinup.ec.europa.eu/community/ods/description
These slides were originally a tutorial presented for the SIG preceding the May 2009 meeting of the PRISM Forum.
They attempt to give a survey of the technologies, tools, and state of the world with respect to the Semantic Web as of the first half of 2009.
This document discusses research objects (ROs) and their role in reproducible science. It makes three key points:
1. Publications should convince readers of validity through reproducible results, but current systems do not fully facilitate reproducibility. ROs can address this by explicitly representing methods used.
2. Reproducibility reinforces results and is a key factor in scientific discovery. ROs provide a reproducible representation of methods.
3. ROs bundle together essential resources from a computational study, such as data, results, methods, people involved, and annotations for understanding, interpretation, and reuse. They support the full experimental lifecycle from problem definition to publication.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
This document summarizes Professor Carole Goble's presentation on making research more reproducible and FAIR (Findable, Accessible, Interoperable, Reusable) through the use of research objects and related standards and infrastructure. It discusses challenges to reproducibility in computational research and proposes bundling datasets, workflows, software and other research products into standardized research objects that can be cited and shared to help address these challenges.
Presentation slides on Open Science and research reproducibility. Presented by Gareth Knight (LSHTM Research Data Manager) on 18th September 2018, as part of an Open Science event for LSHTM Week 2018.
Presentation by Ruth Wilson on Nature Publishing Group's Scientific Data journal given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
This document discusses research objects and scientific workflows. It introduces research objects as a way to aggregate all elements needed to understand a research investigation, including datasets, results, experiments, and provenance. Scientific workflows are presented as tools for automating data-intensive scientific activities, with prospective and retrospective provenance capturing the intended and actual methods. The document outlines an approach to summarizing complex workflows using semantic annotations of workflow motifs and reduction primitives like collapse and eliminate. This distills provenance traces for improved understanding and querying.
Scientific Data is a new category of publication that provides detailed descriptions of scientifically valuable datasets to improve data reproducibility and reuse, with descriptors covering topics like methods, data records, and technical validation. These descriptors undergo a peer review process to assess completeness, consistency, integrity, and experimental rigor. The publication is hosted on Nature.com and aims to improve data discoverability, curation, and peer review through machine-readable metadata and clear links between data, descriptors, and related research papers.
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
The document discusses 10 habits for effective research data: 1) Preserve, 2) Archive, 3) Access, 4) Comprehend, 5) Discover, 6) Reproduce, 7) Trust, 8) Cite, 9) Use, and 10) Putting it all together. It provides examples for each habit, including data rescue challenges, the Olive Project to preserve executable content, metadata tools to improve data comprehension and sharing, initiatives for data indexing and identifiers to improve discovery and reproducibility, and proposals for making data more usable and integrated. The overall message is that adopting standards and collaborating across organizations can help research data achieve its full potential.
1. The document discusses the EOSC Dataset Minimum Information (EDMI) approach for exposing research data in the European Open Science Cloud (EOSC).
2. EDMI defines a set of 12 minimum metadata properties to facilitate finding and accessing datasets without being overly descriptive.
3. The approach was developed by engaging EOSC demonstrator data repositories and repositories to propose methods for exposing metadata in a simple and sustainable way.
This document discusses best practices for managing research data to maximize its value and impact. It begins by outlining why proper data management is important for funding bodies, research institutions, and researchers themselves. It then describes common problems, such as a lack of standardized processes for recording experimental details. The rest of the document details several case studies and initiatives that illustrate different aspects of best practice, including preserving existing data, providing long-term access, improving comprehension, enabling discovery, ensuring reproducibility, validating data quality, and enabling proper citation of data as a research output. The overall goal is to establish data management standards and infrastructure that allow data to achieve its full potential.
re3data.org – a Registry of Research Data RepositoriesHeinz Pampel
re3data.org is a global registry of research data repositories that aims to promote open sharing of research data. It indexes repositories from all academic disciplines to help researchers, funders, publishers, and institutions find appropriate places to store and share research data. The registry has grown significantly since its founding and now indexes over 1,000 repositories. It is a collaborative effort between several German and American institutions and works with other organizations to advance open data policies.
The project re3data.org–Registry of Research Data Repositories–has begun to index research data repositories in 2012 and offers researchers, funding organizations, libraries and publishers an overview of the heterogeneous research data repository landscape.
Linked Data Publication of Live Music Archives and Analysesseanb
This document summarizes work done to publish live music archive data and analyses as linked open data. It describes the Internet Archive Live Music Archive collection containing over 130,000 live music performances. Computational audio analysis was performed to extract features from the recordings, and both the raw metadata and analysis results were published as linked data using ontologies. Exploratory analysis tools were created to analyze relationships in the data and validate metadata. The goal was to integrate audio analyses with bibliographic metadata to better support search and discovery of music performances.
Slides from a keynote talk at the University of Manchester UK Schools Computer Animation Competition in July 2014.
http://animation14.cs.manchester.ac.uk/festival/
Linked Data Publication of Live Music Archivesseanb
The document summarizes work to publish metadata about a live music archive collection as linked data. Key points:
- The metadata from the Internet Archive's Live Music Archive of community-contributed live recordings is published as linked data using semantic technologies like RDF.
- The data is aligned with external resources like MusicBrainz, Geonames, and DBpedia to provide additional context.
- A SPARQL endpoint allows querying the structured data to extract interesting subcollections, such as performances by artists in their home towns.
The document discusses ontologies, vocabularies, and semantic web technologies. It provides an overview of RDF, RDF Schema, and OWL, including their semantics and capabilities. It describes how ontologies can constrain models and enable reasoning to derive inferences from class definitions and axioms. The document also addresses some common misconceptions regarding ontology modeling concepts.
The document introduces Sean Bechhofer and provides his contact information, including that he is from the University of Manchester, his email address, Twitter handle, and blog. It then lists several publications and projects related to reproducible and open research, including myExperiment and Research Objects, with the goal of facilitating exchange and reuse of digital knowledge. Key challenges discussed are how to move beyond linear paper publications to frameworks that better support reuse of digital assets like workflows and datasets.
This document discusses research objects as a framework for facilitating the exchange and reuse of digital knowledge. Research objects are defined as semantically rich aggregations of resources that support a research objective. They allow for workflows, data, documents and other resources to be bundled together and shared. The document outlines several motivating projects, challenges in developing research object models and vocabularies, and a vision for how research objects could allow research to be more efficient, effective and ethical through increased reuse of digital knowledge.
The document summarizes work investigating linked data approaches to support data sharing in the freshwater biology community. It describes initial experiments mapping existing datasets to support research questions on aquatic plant diversity. This included extracting queries and generating tables of correlated data for analysis. Challenges included inconsistent data formats and naming. The document questions the additional value of publishing linked data over simply providing open data and triplified data, and how to support small providers in annotating datasets and non-experts in writing SPARQL queries.
The document discusses SKOS (Simple Knowledge Organization System), a common data model for sharing and linking knowledge organization systems on the web. SKOS allows publishing thesauri and other controlled vocabularies as linked data. It provides a simple framework for representing concepts and semantic relationships to support tasks like searching across mapped thesauri. SKOS has been adopted by several communities and projects for integrating and mapping their vocabularies and terminology systems.
The document discusses key issues around applying semantic web technologies to multimedia data including metadata, annotation, integration and inference. It raises questions around what standards are needed to address these areas and how existing infrastructures could be used or extended to support tasks like feature extraction, tagging, reconciling tags and ensuring consistency at scale. Bridging semantic web approaches and multimedia will depend on the specific tasks and use cases.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
ESPP presentation to EU Waste Water Network, 4th June 2024 “EU policies driving nutrient removal and recycling
and the revised UWWTD (Urban Waste Water Treatment Directive)”
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
2. Publication
• Publications are about argumentation: Convince
the reader of the validity of a position
– Reproducible Results System: facilitates
enactment and publication of reproducible
research.
• Results are reinforced by reproducability
– Explicit representation of method.
• Verifiability as a key factor in scientific discovery.
J. Mesirov Accessible Reproducible Research Science 327(5964), p.415-416,
2010 doi:10.1126/science.1179653
Stodden et. al. Reproducible Research: Addressing the Need for Data and
Code Sharing in Computational Science Computing in Science and
Engineering 12(5), p.8-13, 2010 doi:10.1109/MCSE.2010.113
C.Goble et. al. Accelerating Scientists’ Knowledge Turns
Communications in Computer and Information Science Volume 348,
2013, pp 3-25 doi:10.1007/978-3-642-37186-8_1
4. Scientific Workflows
4
» Scientific workflows are at the heart of
experimental science
› Enable automation of scientific
methods
› Support experimental
reproducibility
› Encourage best practices
» There is then a need to preserve
these workflows
› Scientific development based on
method reuse and repurpose
› Conservation is key
» Workflow preservation is a
multidimensional challenge
› Representation of complex
objects
› Decay analysis, diagnosis, and
prevention
› Social Objects that can be
inspected, reused, repurposed
Preservation of scientific workflows in
data-intensive science
5. Preservation
Technical
Multi-step computational process
Repeatable and comparative
Explicate computation
Social
Virtual Witnessing
Transparent, precise, citable
documentation
Accurate provenance logs
Reusable protocols, know-how,
best practice
Can I review /
repeat your
method?
Can I defend
my method?
Can I reuse /
reproduce this
method?
6. Context: Semantic Web and Linked
Data
• SW: Explicit machine-readable representation of information
• LD: A set of best practices for publishing
and connecting data on the Web
1. Use URIs to name things
2. Use dereferencable HTTP URIs
3. Provide useful content on
lookup using standards
4. Include links to other stuff
6
7. • An aggregation object that bundles together experimental
resources that are essential to a computational scientific study
or investigation.
– data used
– results produced in an experiment study;
– (computational) methods employed to
produce and analyse that data;
– people involved in the investigation.
• Plus annotation information that provides additional
information about both the bundle itself and the resources of
the bundle
– descriptions
– provenance
Research Objects
7
9. • Three principles underlie the approach:
• Identity
– Referring to resources
(and the aggregation itself)
• Aggregation
– Describing the aggregation structure
and its constituent parts
• Annotation
– Associating information with aggregated resources.
Research Objects
9
10. Identity
• Mechanisms for referring to the resources that are aggregated
within a Research Object
• URIs
– Web Resources
• DOIs
– Documents/papers/datasets
• ORCID IDs
– Researchers
10
11. Identifier Issues
• HTTP URIs provide both access and identification
• PIDs: Persistent Identifiers (e.g.DOIs) tend to resolve to
human-readable landing pages
– With embedded links to further (possibly machine-
readable) resources
• ROs seen as non-information resources with descriptive
(RDF) metadata
– Redirection/negotiation
– Standard patterns for Linked Data resources
• Bidirectional mappings between URIs and PIDs
• Versioning through, e.g. Memento
11
H. Van de Sompel et. al. Persistent Identifiers for Scholarly Assets
and the Web: The Need for an Unambiguous Mapping 9th
International Digital Curation Conference
12. Aggregation
• Open Archives Initiation Object Reuse and Exchange (OAI
ORE) is a standard for describing aggregations of web
resources
– http://www.openarchives.org/ore/
• Uses a Resource Map to describe the aggregated resources
• Proxies allow for statements about the resources within the
aggregation
– Capturing context and viewpoints
• Several concrete serialisations
– RDF/XML, Atom, RDFa
12
Graceful Degradation
13. Annotation
• Open Annotation specification is a community developed data
model for annotation of web resources
– http://www.openannotation.org/spec/core/
• Developed by the W3C Open Annotation Community Group
• Allows for “stand-off” annotations
– Annotation as a first class citizen
• Developed to fit with Web Architecture
13
Graceful Degradation
14. Annotation Content
• Essential to the understanding and interpretation of the
scientific outcomes captured by a Research Object as well as
the reuse of the resources within it.
– Provenance information about the experiments, the study
or any other experimental resources
– Evolution information about the Research Object and its
resources,
– Descriptions of computational methods
or processes
– Dependency information or settings
about the experiment executions
14
15. Core & Extensions
• Core model provides support for aggregation and annotation
• Extensions provide additional vocabularies for domain specific
tasks
• Workflow Provenance
– Information capturing workflow executions
• Workflow Description
– Abstractions describing Processes, inputs and outputs
• Research Object Evolution
– Information describing change and “snapshots”
15
19. preservation and access to preserved ROs as depicted in Figure 6. Optionally, an external repository may
used to support the frequently evolving research objects. The repositories may be housed in a single
multiple physical repositories, and use the same or differing technologies (e.g. a repository may use a dig
preservation solution for the Preservation Repository and specialized digital library solution for the Acce
Repository). Additionally, as the Preservation Repository does not have the same interactive u
requirements as the access and live repositories, it could be implemented with slower (or offline) stora
alternatives.
Figure 6. Conceptual Archival System Storage Architecture.
ROs and OAIS
• ROs as Information Packages in OAIS
• myExperiment as live/access repository
• ROHUB as archival repository
19
20. SCAPE: Planning and Watch
20
Watch
OperationsPlanning
Env &
Users
Repository
plan
deploy
monitor monitor
monitor
access
ingest,
harvest
execution
http://www.scape-project.eu/
• SCAPE project concerned with Digital Preservation.
• Planning and Watch infrastructure to helpmmonitor
the state of a repository and co-ordinate appropriate actions
• Driven by policies.
21. myExperiment and RODL
Decay, Service
Deprecation,
Data source monitoring,
Checklists,
Minimal Models
Wf4Ever: Monitoring and Watch
21
Watch
OperationsPlanning
Env &
Users
Repository
plan
deploy
monitor monitor
monitor
access
ingest,
harvest
execution
• Ideas applied to workflow preservation
22. Decay
• Survey of 92 Taverna workflows from myExperiment
• Volatile Third-Party
Resources
• Missing Data
• Missing Execution Environments
• Poor descriptions
22
Belhajjame et. al. Why workflows break — Understanding
and combating decay in Taverna workflows e-Science 2012
doi:10.1109/eScience.2012.6404482
(a) An overview of the decay causes. (b) Workflow decay due to third party resources.
Fig. 3. Summary of workflow decay causes.
23. Checklists and Validation
• Checklists widely used to support safety, quality and
consistency
• Common in experimental science
– Expressing minimum information
required
– Supporting “health” monitoring of
workflow-centric ROs.
• Checklists can be defined in terms of
the RO model and its annotations
– Generic checklist service then
executes against that model and
the given annotations
– Provenance 23
24. Minim Data Model
pliant” or “ minimally compliant” with a checklist if it satisfies all of its MAY,
SHOULD or MUST items respectively.
Fig. 1. An overview of the Minim model schema.
Checklist
Requirement
QueryTestRule SparqlQuery
Result modifier
(string)
Query pattern
(string)Rule
CardinalityTest
Min cardinality
(integer)
AggregationTest
URI template
(string)
Max cardinality
(integer)
min
max
affirmRuleaggregatesTemplate
hasRequirement:
hasMustRequirement
hasShouldRequirement
hasMayRequirement
isLiveTemplate
sparql_query result_mod
toModel Notation key:
Explicit entity Implicit (super)class
Literal value
(type)
property
query
graph
QueryResultTest
RuleTest
exists
0..1
0..1
1
1
0..1
0..1
1 1
1
1..*
SoftwareEnvRule
URI template
(string)
Query
AccessibilityTest
URI template
(string)
ExistsTest
Rule
max 1 1
Query
Model
isDerivedBy
1..1
Our Minim data model (see Figure 1) provides 4 core constructs to express
a quality requirement: 24
Zhao et. al. A Checklist-Based Approach for
Quality Assessment of Scientific Information
3rd In. Workshop on Linked Science, 2013
27. RO Bundle
• A single, transferable object encapsulating the description and
resources of an RO
– Download, transfer, publish
• ZIP-based format (resources) plus a manifest describing
aggregation and annotations (description)
– Unpack with standard tooling
• JSON-LD as a representation for manifest
– Lightweight linked-data format
– Compatible with existing JSON tooling and services
– PROV-O and OAC for annotations
27
http://wf4ever.github.io/ro/bundle/
28. Bundling via git/Zenodo/figshare
• Scientist works with local folder structure.
– Version management via github.
– Local tooling produces metadata description
– Metadata about the aggregation (and its resources)
provided by “hidden folder”
• Zenodo/figshare pull snapshot from github
– Providing DOIs for the aggregrations
– Additional release cycles can prompt new DOIs
28
37. Wrap Up
• Aggregation objects bundling together experimental resources
that are essential to a computational scientific study or
investigation
– Intended to support greater transparency and
reproducability
• Annotations provide additional information
about the bundle and its contents
– Metadata is key here
• Use of existing standards, vocabularies and
infrastructure
• Nascent tooling to support creation,
management and publication
37
38. Thanks!
• All the members of the Wf4Ever team
– iSOCO: Intelligent Software Components S.A., Spain
– University of Manchester, School of Computer Science, Manchester, United
Kingdom
– University of Oxford, Department of Zoology, Oxford, UK
– Poznan Supercomputing and Networking Center. Poznan, Poland
– IAA: Instituto de Astrofísica de Andalucía, Granada, Spain
– Leiden University Medical Centre, Centre for Human and Clinical Genetics,
The Netherlands
• Colleagues in Manchester’s Information Management Group
• RO Advisory Board Members
38
http://www.researchobject.org
http://www.wf4ever-project.org
Editor's Notes
Metadata to support reproducibility.
What does that mean?
What do we need to do?
How do we do it?
Will run through the approach that was taken, and some of the vocabs and standards that are being used to do it.
What’s the purpose of publication?
Publications intended to present results/positions, along with arguments that reinforce those positions.
Reproducability reinforces the validity of our positions.
May require us to include much more information than can be included in a paper:in particular, data sets and methods.
Understanding the different roles that are involved in supporting the scientific lifecycle and experimental process.
One of the key issue is that HTTP URIs serve multiple purposes. They are identifiers, but also serve as a mechanism for locating or accessing the content. PIDs, on the other hand tend to involve a resolution or redirection process which guides us to the content. Commonly that resolution ends up on a landing page though – for example DOIs usually resolve to a web page, that may then provide embedded links to further resources.
We can consider ROs as non-information resources (things who’s distinguishing characteristics can’t be conveyed in a message). On resolving the ID for such a thing we get descriptive metadata about it (but not the thing itself). This is a common pattern used for Linked Data resources.
Herbert proposes a bidirectional mapping between PIDs and the HTTP URIs that provide access to the informaiton about them. So we can go from PID to stuff, and from stuff to the PID that it is about.
Approaches like Memento could then be applied to support versioning.
I don’t think there are necessarily any deep problems lurking here – it’s more about the way in which services are set up and establishing convention and practice.
Lose this?
Local folder/file structures – experiences with our astronomy users.
Use github for version management. Local tooling produces metadata descriptions.
Example RO in zenodo
Example RO in figshare.
Cf Code as a research object.
Work by Dani Garijo of UPM. Web page generated from metadata about papers. RO includes information about the materials provided.
Systems biology bundling. Experiments in mapping between COMBINE archives and ROs.
http://co.mbine.org/documents/archive