EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
This document discusses the development of the NIH Data Commons, which aims to create a shared framework and infrastructure for biomedical data. It notes the increasing amounts of data being generated and the need for data sharing and interoperability. The Data Commons framework treats data, tools, and publications as digital objects that are findable, accessible, interoperable and reusable. Current pilots include deploying reference datasets in the cloud, indexing data and tools, and a credits system for cloud resources. Challenges discussed include metrics, costs, standards, incentives and sustainability. The framework's relevance for supporting open data in Australia is also addressed.
The document provides an overview of the development of the NIH Data Commons. It discusses factors driving the need for a data commons, including large amounts of data being generated and increased support for data sharing. It outlines the goals of making data findable, accessible, interoperable and reusable. Several pilots are exploring the feasibility of the commons framework, including placing large datasets in the cloud and developing indexing methods. Considerations in fully realizing the commons are also discussed, such as standards, discoverability, policies and incentives.
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
Abstract
Good data stewardship is the cornerstone of knowledge, discovery, and innovation in research. The FAIR Data Principles address data creators, stewards, software engineers, publishers, and others to promote maximum use of research data. The principles can be used as a framework for fostering and extending research data services.
This talk will provide an overview of the FAIR principles and the drivers behind their development by a broad community of international stakeholders. We will explore a range of topics related to putting FAIR data into practice, including how and where data can be described, stored, and made discoverable (e.g., data repositories, metadata); methods for identifying and citing data; interoperability of (meta)data; best-practice examples; and tips for enabling data reuse (e.g., data licensing). Practical examples of how FAIR is applied will be provided along the way.
Presenter: Christopher Erdmann, Engagement, support, and training expert on the NHLBI BioData Catalyst project at University of North Carolina Renaissance Computing Institute
dkNET Webinars Information: https://dknet.org/about/webinar
Micah Altman, Harvard; Policy-based Data Management
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
The document discusses a global initiative to facilitate open access to scholarly resources and research data across boundaries by building a federation of registries. It provides use cases of how such a system could help postgraduate students, research project leaders, administrators, and ICT specialists discover and monitor globally accessible data relevant to their work. The proposed strategy is to create a "Register of Registries" that would enable consistent discovery services for finding data in collections through a standardized, interoperable model. An initial scoping meeting was held in 2007 and annual meetings since to develop the strategy.
Smith RDAP11 NSF Data Management Plan Case StudiesASIS&T
MacKenzie Smith, MIT; NSF Data Management Plan Case Studies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
This document discusses several studies on user engagement in research data curation. It finds that institutional repositories for data were developed without input from researchers, leading to systems that did not meet researchers' needs. Barriers to open data sharing included concerns over commercial use and maintaining ownership. Successful data curation requires understanding disciplinary differences and developing trusted relationships with researchers through dialogue early in projects.
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
This document discusses the development of the NIH Data Commons, which aims to create a shared framework and infrastructure for biomedical data. It notes the increasing amounts of data being generated and the need for data sharing and interoperability. The Data Commons framework treats data, tools, and publications as digital objects that are findable, accessible, interoperable and reusable. Current pilots include deploying reference datasets in the cloud, indexing data and tools, and a credits system for cloud resources. Challenges discussed include metrics, costs, standards, incentives and sustainability. The framework's relevance for supporting open data in Australia is also addressed.
The document provides an overview of the development of the NIH Data Commons. It discusses factors driving the need for a data commons, including large amounts of data being generated and increased support for data sharing. It outlines the goals of making data findable, accessible, interoperable and reusable. Several pilots are exploring the feasibility of the commons framework, including placing large datasets in the cloud and developing indexing methods. Considerations in fully realizing the commons are also discussed, such as standards, discoverability, policies and incentives.
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
Abstract
Good data stewardship is the cornerstone of knowledge, discovery, and innovation in research. The FAIR Data Principles address data creators, stewards, software engineers, publishers, and others to promote maximum use of research data. The principles can be used as a framework for fostering and extending research data services.
This talk will provide an overview of the FAIR principles and the drivers behind their development by a broad community of international stakeholders. We will explore a range of topics related to putting FAIR data into practice, including how and where data can be described, stored, and made discoverable (e.g., data repositories, metadata); methods for identifying and citing data; interoperability of (meta)data; best-practice examples; and tips for enabling data reuse (e.g., data licensing). Practical examples of how FAIR is applied will be provided along the way.
Presenter: Christopher Erdmann, Engagement, support, and training expert on the NHLBI BioData Catalyst project at University of North Carolina Renaissance Computing Institute
dkNET Webinars Information: https://dknet.org/about/webinar
Micah Altman, Harvard; Policy-based Data Management
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
The document discusses a global initiative to facilitate open access to scholarly resources and research data across boundaries by building a federation of registries. It provides use cases of how such a system could help postgraduate students, research project leaders, administrators, and ICT specialists discover and monitor globally accessible data relevant to their work. The proposed strategy is to create a "Register of Registries" that would enable consistent discovery services for finding data in collections through a standardized, interoperable model. An initial scoping meeting was held in 2007 and annual meetings since to develop the strategy.
Smith RDAP11 NSF Data Management Plan Case StudiesASIS&T
MacKenzie Smith, MIT; NSF Data Management Plan Case Studies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
This document discusses several studies on user engagement in research data curation. It finds that institutional repositories for data were developed without input from researchers, leading to systems that did not meet researchers' needs. Barriers to open data sharing included concerns over commercial use and maintaining ownership. Successful data curation requires understanding disciplinary differences and developing trusted relationships with researchers through dialogue early in projects.
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
This document discusses the Sustainable Environment - Actionable Data (SEAD) project. SEAD aims to provide data services to sustainability researchers by developing tools that address challenges like heterogeneous and small datasets. It plans to move data curation upstream, involve domain scientists, and leverage social media and metadata. SEAD will integrate these active curation services into a federated infrastructure to preserve datasets long-term. The project is led by researchers from multiple institutions and funded by the National Science Foundation.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)SEAD
This document summarizes a panel discussion on the NSF funded Datanet partnerships program. It introduces the panelists from various Datanet projects including SEAD, TerraPop, Datanet Federation Consortium, and DataOne. It then provides more detail on the goals and strategies of the SEAD project, which aims to develop tools and services to address the needs of long-tail sustainability research by leveraging social curation and active metadata. SEAD works to move data curation upstream and engage researchers throughout the project using automated metadata and volunteered contributions.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
This document discusses the Sustainable Environment Actionable Data (SEAD) project, which aims to lower the costs and increase the value of data curation through a data lifecycle approach. SEAD provides lightweight data services to support sustainability research, including secure project workspaces, active and social curation tools, and integrated lifecycle support for data from ingest to long-term preservation. By leveraging technologies like Web 2.0 and standards, SEAD simplifies and automates curation processes using metadata captured from data producers and users. This allows curation activities to begin earlier in the data lifecycle and be distributed across researchers and curators.
The document summarizes a workshop on geospatial metadata and spatial data. It discusses the importance of metadata for managing and sharing spatial datasets, providing key information about the data. It also covers metadata standards like FGDC, ISO 19115, and application profiles. The workshop includes presentations on the UK Academic Geospatial Metadata Application Profile and tools for creating metadata like the Geodoc Metadata Editor and Go-Geo portal.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
The document proposes the creation of a federated cloud computing platform called "The Commons" to support biomedical data sharing and analysis across multiple cloud providers. Key points:
- The Commons would index metadata and digital objects across conformant public and private cloud providers.
- It would be funded by providing credits to investigators for storage and computing, creating competition among providers to offer better services at lower costs.
- A phased implementation is outlined to initially involve experienced users and later expand to all NIH grantees.
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
D4Science Data infrastructure: a facilitator for a FAIR data managementResearch Data Alliance
D4Science is a hybrid data infrastructure that integrates technologies to provide elastic access and usage of data and data management capabilities. It hosts over 50 virtual research environments for over 2500 scientists across 44 countries. D4Science aims to facilitate FAIR (Findable, Accessible, Interoperable, Re-usable) data management by assigning unique identifiers and rich metadata to resources, publishing catalogs to enable discovery, making resources available through standards, adding metadata in multiple formats, and requiring licenses and provenance to promote reuse.
Trusted data repositries and the CoreTrustSeal webinar on March 13th 2018 from ANDS-Nectar-RDS where NIF discuss their journey to become a trusted data repository with the CoreTrustSeal. presented by Andrew Mehnert.
Recordings and transcript available from the ANDS website; http://www.ands.org.au/news-and-events/presentations/2018
This document discusses drivers and organizational responses to research data management (RDM) maturity from transatlantic perspectives. It describes external funder mandates in the US and UK that require open sharing of research publications and data. Universities have responded by developing RDM policies, tools, expertise, and education/outreach for researchers. Key RDM components discussed include policies, storage and repository tools, expertise and staffing models, and outreach/education activities. Connecting electronic lab notebooks to other RDM infrastructure is presented as an approach to better integrate researcher workflows with institutional RDM. The document concludes with an invitation to provide comments on RDM maturity through an online survey.
Practical and Conceptual Considerations of Research Object PreservationSEAD
This document discusses research object (RO) frameworks for preserving digital research data. It addresses the challenges of research spanning long periods of time and involving complex, heterogeneous data that changes states. The research object framework aims to capture agents, states, relationships, and content to enable automation, reproducibility, and reuse of research. The framework defines three states for research objects - live, curated, and published. Live objects are works in progress, curated objects are packaged for preservation, and published objects are immutable and citable. The framework allows documentation of research processes and outputs to build trust and facilitate reuse.
Presentation investigating the state of FAIR practice and what is needed to turn FAIR data into reality given at the Danish FAIR conference in Copenhagen on 20th November 2018. https://vidensportal.deic.dk/en/Programme/FAIR_Toolbox_Nov2018 The presentation reflect on recent FAIR studies and international initiatives and outlines the recommendations emerging from the European Commission's FAIR Data Expert Group report - http://tinyurl.com/FAIR-EG
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
This document discusses the Sustainable Environment - Actionable Data (SEAD) project. SEAD aims to provide data services to sustainability researchers by developing tools that address challenges like heterogeneous and small datasets. It plans to move data curation upstream, involve domain scientists, and leverage social media and metadata. SEAD will integrate these active curation services into a federated infrastructure to preserve datasets long-term. The project is led by researchers from multiple institutions and funded by the National Science Foundation.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)SEAD
This document summarizes a panel discussion on the NSF funded Datanet partnerships program. It introduces the panelists from various Datanet projects including SEAD, TerraPop, Datanet Federation Consortium, and DataOne. It then provides more detail on the goals and strategies of the SEAD project, which aims to develop tools and services to address the needs of long-tail sustainability research by leveraging social curation and active metadata. SEAD works to move data curation upstream and engage researchers throughout the project using automated metadata and volunteered contributions.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
This document discusses the Sustainable Environment Actionable Data (SEAD) project, which aims to lower the costs and increase the value of data curation through a data lifecycle approach. SEAD provides lightweight data services to support sustainability research, including secure project workspaces, active and social curation tools, and integrated lifecycle support for data from ingest to long-term preservation. By leveraging technologies like Web 2.0 and standards, SEAD simplifies and automates curation processes using metadata captured from data producers and users. This allows curation activities to begin earlier in the data lifecycle and be distributed across researchers and curators.
The document summarizes a workshop on geospatial metadata and spatial data. It discusses the importance of metadata for managing and sharing spatial datasets, providing key information about the data. It also covers metadata standards like FGDC, ISO 19115, and application profiles. The workshop includes presentations on the UK Academic Geospatial Metadata Application Profile and tools for creating metadata like the Geodoc Metadata Editor and Go-Geo portal.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
The document proposes the creation of a federated cloud computing platform called "The Commons" to support biomedical data sharing and analysis across multiple cloud providers. Key points:
- The Commons would index metadata and digital objects across conformant public and private cloud providers.
- It would be funded by providing credits to investigators for storage and computing, creating competition among providers to offer better services at lower costs.
- A phased implementation is outlined to initially involve experienced users and later expand to all NIH grantees.
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
D4Science Data infrastructure: a facilitator for a FAIR data managementResearch Data Alliance
D4Science is a hybrid data infrastructure that integrates technologies to provide elastic access and usage of data and data management capabilities. It hosts over 50 virtual research environments for over 2500 scientists across 44 countries. D4Science aims to facilitate FAIR (Findable, Accessible, Interoperable, Re-usable) data management by assigning unique identifiers and rich metadata to resources, publishing catalogs to enable discovery, making resources available through standards, adding metadata in multiple formats, and requiring licenses and provenance to promote reuse.
Trusted data repositries and the CoreTrustSeal webinar on March 13th 2018 from ANDS-Nectar-RDS where NIF discuss their journey to become a trusted data repository with the CoreTrustSeal. presented by Andrew Mehnert.
Recordings and transcript available from the ANDS website; http://www.ands.org.au/news-and-events/presentations/2018
This document discusses drivers and organizational responses to research data management (RDM) maturity from transatlantic perspectives. It describes external funder mandates in the US and UK that require open sharing of research publications and data. Universities have responded by developing RDM policies, tools, expertise, and education/outreach for researchers. Key RDM components discussed include policies, storage and repository tools, expertise and staffing models, and outreach/education activities. Connecting electronic lab notebooks to other RDM infrastructure is presented as an approach to better integrate researcher workflows with institutional RDM. The document concludes with an invitation to provide comments on RDM maturity through an online survey.
Practical and Conceptual Considerations of Research Object PreservationSEAD
This document discusses research object (RO) frameworks for preserving digital research data. It addresses the challenges of research spanning long periods of time and involving complex, heterogeneous data that changes states. The research object framework aims to capture agents, states, relationships, and content to enable automation, reproducibility, and reuse of research. The framework defines three states for research objects - live, curated, and published. Live objects are works in progress, curated objects are packaged for preservation, and published objects are immutable and citable. The framework allows documentation of research processes and outputs to build trust and facilitate reuse.
Presentation investigating the state of FAIR practice and what is needed to turn FAIR data into reality given at the Danish FAIR conference in Copenhagen on 20th November 2018. https://vidensportal.deic.dk/en/Programme/FAIR_Toolbox_Nov2018 The presentation reflect on recent FAIR studies and international initiatives and outlines the recommendations emerging from the European Commission's FAIR Data Expert Group report - http://tinyurl.com/FAIR-EG
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
Virtual Research Environments supporting tailor-made data management service...Blue BRIDGE
Presented by Pasquale Pagano of CNR at the BlueBRIDGE Workshop at SeaTech Week 2016 in Brest, France. http://www.bluebridge-vres.eu/events/join-bluebridge-10th-biennial-sea-tech-week-brest-france
The document discusses the Enabling FAIR Data project, which aims to improve data sharing practices in earth and environmental sciences. It outlines the FAIR data principles, key stakeholders in the project including publishers and repositories, and outputs including a commitment statement, repository finder tool, and shared authoring guidelines. The next steps are to encourage more organizations to sign and implement the commitment statement and guidelines to promote open and interoperable data.
dkNET Office Hours: NIH Data Management and Sharing Mandate 05/03/2024dkNET
Presenter: Jeffrey Grethe, PhD, Principal Investigator of NIDDK Information Network (dkNET), Center for Research in Biological Systems, University of California San Diego
For all proposals submitted on/after January 25 2023, NIH requires the sharing of data from all NIH funded studies. Do you have appropriate data management practices and sharing plans in place to meet these requirements? Have questions or need some help? Join the dkNET office hours to learn about NIH’s policy (NOT-OD-21-013) and resources that could help.
*Previous Office Hours Slides and Recording: https://dknet.org/rin/research-data-management
Upcoming Webinars Schedule: https://dknet.org/about/webinar
Virtual research environments for implementing long tail open scienceBlue BRIDGE
This document discusses virtual research environments (VREs) for supporting "long-tail open science". It defines VREs as operational environments that dynamically aggregate resources like data, services, and computing/storage for users. VREs aim to support collaborative research, reproducibility, and open sharing of data/findings while providing simplified access. The document outlines how VREs can be created on demand, integrated with applications/services, and used for collaborative experiments and workflows to enable repeatability and reuse of research. Real-world examples of VREs like D4Science are presented.
A brief introduction of dkNET (NIDDK Information Network; https://dknet.org) and the services and resources that are available, including Resource Reports, Authentication Reports, FAIR Data Services, Discovery Portal and Hypothesis Center.
Research Data Management in GLAM: Managing Data for Cultural HeritageSarah Anna Stewart
Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET
This document summarizes an online meeting about resources available from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) to support researchers in implementing the new 2023 NIH data management and sharing policy. It discusses the NIDDK Central Repository, which acquires, maintains and distributes data and biospecimens from NIDDK-funded clinical studies. Eligibility and requirements for submitting resources to the repository are also covered.
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
Heinz Pampel | GFZ German Research Centre for Geosciences, LIS
Maxi Kindling | Humboldt-Universität zu Berlin, Berlin School of Library and Information Science Frank Scholze | Karlsruhe Institute of Technology, KIT Library
RDA-Deutschland-Treffen 2015| Potsdam, November 26, 2015
This document summarizes Simon Hodson's presentation on open science and FAIR data developments globally. Some key points:
1) There is a growing policy push for open research data, with funders and organizations adopting data sharing policies based on FAIR data principles of findability, accessibility, interoperability, and reusability.
2) Initiatives are working to build the international ecosystem of open science, including components for reporting research outputs, persistent identifiers, data standards, data repositories, and criteria for trustworthy data.
3) The African Open Science Platform aims to lay the foundations for open science in Africa through frameworks for policy, incentives, training, and technical infrastructure development.
4) International
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
Turning FAIR into Reality: Final outcomes from the European Commission FAIR D...Sarah Jones
A multi-speaker presentation given by the European Commission FAIR Data Expert Group at ScieDataCon as part of International Data Week in Botswana in November 2018.
Simon Hodson, Chair of the Group explained the remit and background. Natalie Harrower outlined key concepts. Francoise Genova spoke on the recommendations related to research data culture. Daniel Mietchen addressed the infrastructure needed and our proposals for a FAIR ecosystem, and Sarah Jones spoke to the cultural aspects needed to drive change and outlined the FAIR Action Plan.
The report has been revised in light of the 500+ comments received as part of the open consultation and will be formally released on 23rd November as part of the Austrian Presidency events.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
The document discusses sharing research data through open data platforms. It describes the CGIAR as uniquely positioned to collect agricultural data worldwide and argues that most CGIAR data should be archived and shared to increase its value. However, data archiving across CGIAR centers is currently poor. The document then discusses using the Dataverse platform to improve data sharing. Dataverse allows researchers to publish, share, cite, and analyze data. It also facilitates making data available while giving credit to data authors and institutions.
Similar to D4Science Data Infrastructure - Facilitator for a FAIR Data Management (20)
The BIG picture - Advanced data visualization for SDG, basic stock assessment...Blue BRIDGE
This document discusses several applications that have been developed using EU e-infrastructures to support blue growth. It describes applications for advanced data visualization, stock assessment, environmental monitoring, modeling of invasive species and events, fisheries data analysis, protected areas impact mapping, and aquaculture monitoring. These applications provide access to biodiversity, environmental, and fisheries data and make use of computing resources and services to analyze data and produce outputs in reusable, interoperable formats.
Global Record of Stocks and Fisheries (GRFS)Blue BRIDGE
The Global Record of Stocks and Fisheries (GRSF) is an inventory of global stocks and fisheries records from multiple data providers. It assigns unique identifiers to standardized stock and fishery identifications. The GRSF knowledge base collates data and assigns identifiers. It has achieved the development of two virtual research environments and uptake is being considered by the Fisheries Resources Monitoring System partnership. The outcome could include using unique identifiers for product labeling and supporting international goals. The exploitation plan is to gradually populate the GRSF and present it at the FAO Committee on Fisheries in 2018.
BlueBRIDGE: Major Achievements & future visionBlue BRIDGE
BlueBRIDGE is a project funded by the European Union to support blue growth (sustainable use of ocean resources) through virtual research environments (VREs) and innovative applications based on EU e-infrastructures. The project aims to facilitate collaboration between scientists, SME innovators, and educators addressing blue growth challenges. It has created 54 VREs covering topics like aquaculture, biodiversity, and stock assessment. BlueBRIDGE also works to enhance e-infrastructure capabilities and integrate resources from multiple providers. Going forward, it seeks to maintain existing VREs and products through business agreements to ensure their long-term sustainability in supporting the blue growth community.
Managing tuna fisheries data at a global scale: the Tuna Atlas VREBlue BRIDGE
On 18th January 2018, 3pm CET BlueBRIDGE will hosted the webinar: "Managing tuna fisheries data at a global scale: the Tuna Atlas VRE" that presented how, through the Tuna Atlas Virtual Research Environment (VRE), users can easily produce their own datasets of fisheries at regional, multi-regional or global scale and how they can share these datasets in ways that allow other users to access, process and visualise them efficiently.
SeaDataCloud – further developing the pan-European SeaDataNet infrastructure ...Blue BRIDGE
SeaDataCloud is a project to further develop SeaDataNet, a pan-European infrastructure for managing marine and ocean data. It aims to update standards, improve services and products, adopt new technologies, and strengthen cooperation between SeaDataNet data centers and EUDAT e-infrastructure providers. Key goals include upgrading the Central Data Index service to use cloud computing, integrating data from other programs, and developing a virtual research environment for advanced data analysis and product development using marine data. EUDAT partners will contribute technical expertise to help achieve these objectives and enhance the management and use of oceanographic data across Europe.