This document summarizes a presentation on Research Objects given on October 29, 2018 in Amsterdam. It discusses how Research Objects can bundle together different components of a research investigation, such as data, methods, provenance, and results, to facilitate exchange, reproducibility, and preservation of research. The Research Object framework carries machine-readable metadata about these components. Examples are given of Research Objects that bundle workflows and computational experiments. Challenges and opportunities are discussed around developing community tools and standards to work with Research Objects.
Better software, better service, better research: The Software Sustainabilit...Carole Goble
Ever spotted some great looking software only to discover you can’t get it, it doesn’t work, there is no documentation to help fix it and the developers don’t have the time or incentive to help? Ever produced some software that you want to be widely used or have folks contribute? What’s the sustainability of that key platform/library/tool /database your lab uses day in and day out? Are you helping the providers? The same issues stand for Data (or as we now say “FAIR” Findable, Accessible, Interoperable, Reusable Data) and its metadata. Is anyone looking out for Europe’s data services– the datasets and analysis systems you use and you make – the standards they use and the curators and developers who make them? Or is FAIR just a FAIRy story? I’ll tell how two organisations with quite different structures and approaches - the UK’s Software Sustainability Institute and the ELIXIR European Research Infrastructure for Life Science Data – are working for the common goal of better software, better service, and better research.
https://www.rothamsted.ac.uk/events/14th-international-symposium-integrative-bioinformatics
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
presentation at https://researchsoft.github.io/FAIReScience/, FAIReScience 2021 online workshop
virtually co-located with the 17th IEEE International Conference on eScience (eScience 2021)
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
RO-Crate: A framework for packaging research products into FAIR Research Objects presented to Research Data Alliance RDA Data Fabric/GEDE FAIR Digital Object meeting. 2021-02-25
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
Better software, better service, better research: The Software Sustainabilit...Carole Goble
Ever spotted some great looking software only to discover you can’t get it, it doesn’t work, there is no documentation to help fix it and the developers don’t have the time or incentive to help? Ever produced some software that you want to be widely used or have folks contribute? What’s the sustainability of that key platform/library/tool /database your lab uses day in and day out? Are you helping the providers? The same issues stand for Data (or as we now say “FAIR” Findable, Accessible, Interoperable, Reusable Data) and its metadata. Is anyone looking out for Europe’s data services– the datasets and analysis systems you use and you make – the standards they use and the curators and developers who make them? Or is FAIR just a FAIRy story? I’ll tell how two organisations with quite different structures and approaches - the UK’s Software Sustainability Institute and the ELIXIR European Research Infrastructure for Life Science Data – are working for the common goal of better software, better service, and better research.
https://www.rothamsted.ac.uk/events/14th-international-symposium-integrative-bioinformatics
FAIR Workflows and Research Objects get a Workout Carole Goble
So, you want to build a pan-national digital space for bioscience data and methods? That works with a bunch of pre-existing data repositories and processing platforms? So you can share FAIR workflows and move them between services? Package them up with data and other stuff (or just package up data for that matter)? How? WorkflowHub (https://workflowhub.eu) and RO-Crate Research Objects (https://www.researchobject.org/ro-crate) that’s how! A step towards FAIR Digital Objects gets a workout.
Presented at DataVerse Community Meeting 2021
German Conference on Bioinformatics 2021
https://gcb2021.de/
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
presentation at https://researchsoft.github.io/FAIReScience/, FAIReScience 2021 online workshop
virtually co-located with the 17th IEEE International Conference on eScience (eScience 2021)
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
RO-Crate: A framework for packaging research products into FAIR Research Objects presented to Research Data Alliance RDA Data Fabric/GEDE FAIR Digital Object meeting. 2021-02-25
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
presented at WORKS 2021
https://works-workshop.org/
16th Workshop on Workflows in Support of Large-Scale Science
November 15, 2021
Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
In a talk for the Newcastle Bioinformatics Special Interest Group (http://bsu.ncl.ac.uk/fms-bioinformatics) I explored the topic of reproducibility. Looking at the pros and cons of pipelining analyses, as well as some tools for achieving this. I also considered some additional tools for enabling reproducible bioinformatics, and look at the 'executable paper', and whether it represents the future for bioinformatics publishing.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
Presentation given on June 22, 2013, in Nice, at the CIBB 2013 International Workshop.
In collaboration with Paolo Missier, University of Newcastle upon Tyne, UK
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
A keynote given on the FAIR Data Principles at the FAIRplus Innovation and SME Forum, Hinxton Genome Campus, Cambridge, UK, 29 January 2020. The history of the principles, issues about the principles and speculations about the future
Open Science: how to serve the needs of the researcher? Carole Goble
Open science Jisc CNI roundtable 2018
Lightning talk
What should the future look like?
What are the essential characteristics we desire in a relatively near future system to support scholarly communication across the full research life cycle?
What are the key areas requiring attention, action, or investment today to reach the future that we want to reach?
What are the best opportunities to build upon existing practices, investments and infrastructure, both
open and commercially provided?
Where must alternatives be developed?
What areas are already on good trajectories and can be left to evolve without additional intervention
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
Science is knowledge work. The scientific method and scholarly communication are about facilitating “knowledge turns” – that is, the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR flow and availability of data, methods for automated processing, reproducible results and on a society of scientists coordinating and collaborating. We need to build a new form of Research Commons and I will present my steps towards this.
Presented at Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018 that accompanied the 42nd Dies Natalis where I was awarded an honorary doctorate
Personal video:
https://www.youtube.com/watch?v=k5WN6KDDatU&index=4&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Video of the symposium:
https://www.youtube.com/watch?v=JN9eMMtCHf8&t=19s&index=6&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT…..
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including:
· FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions
· ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows
· Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
FAIR Computational Workflows
Computational workflows capture precise descriptions of the steps and data dependencies needed to carry out computational data pipelines, analysis and simulations in many areas of Science, including the Life Sciences. The use of computational workflows to manage these multi-step computational processes has accelerated in the past few years driven by the need for scalable data processing, the exchange of processing know-how, and the desire for more reproducible (or at least transparent) and quality assured processing methods. The SARS-CoV-2 pandemic has significantly highlighted the value of workflows.
This increased interest in workflows has been matched by the number of workflow management systems available to scientists (Galaxy, Snakemake, Nextflow and 270+ more) and the number of workflow services like registries and monitors. There is also recognition that workflows are first class, publishable Research Objects just as data are. They deserve their own FAIR (Findable, Accessible, Interoperable, Reusable) principles and services that cater for their dual roles as explicit method description and software method execution [1]. To promote long-term usability and uptake by the scientific community, workflows (as well as the tools that integrate them) should become FAIR+R(eproducible), and citable so that author’s credit is attributed fairly and accurately.
The work on improving the FAIRness of workflows has already started and a whole ecosystem of tools, guidelines and best practices has been under development to reduce the time needed to adapt, reuse and extend existing scientific workflows. An example is the EOSC-Life Cluster of 13 European Biomedical Research Infrastructures which is developing a FAIR Workflow Collaboratory based on the ELIXIR Research Infrastructure for Life Science Data Tools ecosystem. While there are many tools for addressing different aspects of FAIR workflows, many challenges remain for describing, annotating, and exposing scientific workflows so that they can be found, understood and reused by other scientists.
This keynote will explore the FAIR principles for computational workflows in the Life Science using the EOSC-Life Workflow Collaboratory as an example.
[1] Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes,Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters, and Daniel Schober FAIR Computational Workflows Data Intelligence 2020 2:1-2, 108-121 https://doi.org/10.1162/dint_a_00033.
presented at WORKS 2021
https://works-workshop.org/
16th Workshop on Workflows in Support of Large-Scale Science
November 15, 2021
Held in conjunction with SC21: The International Conference for High Performance Computing, Networking, Storage and Analysis
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
In a talk for the Newcastle Bioinformatics Special Interest Group (http://bsu.ncl.ac.uk/fms-bioinformatics) I explored the topic of reproducibility. Looking at the pros and cons of pipelining analyses, as well as some tools for achieving this. I also considered some additional tools for enabling reproducible bioinformatics, and look at the 'executable paper', and whether it represents the future for bioinformatics publishing.
Scientific Workflows: what do we have, what do we miss?Paolo Romano
Presentation given on June 22, 2013, in Nice, at the CIBB 2013 International Workshop.
In collaboration with Paolo Missier, University of Newcastle upon Tyne, UK
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
A keynote given on the FAIR Data Principles at the FAIRplus Innovation and SME Forum, Hinxton Genome Campus, Cambridge, UK, 29 January 2020. The history of the principles, issues about the principles and speculations about the future
Open Science: how to serve the needs of the researcher? Carole Goble
Open science Jisc CNI roundtable 2018
Lightning talk
What should the future look like?
What are the essential characteristics we desire in a relatively near future system to support scholarly communication across the full research life cycle?
What are the key areas requiring attention, action, or investment today to reach the future that we want to reach?
What are the best opportunities to build upon existing practices, investments and infrastructure, both
open and commercially provided?
Where must alternatives be developed?
What areas are already on good trajectories and can be left to evolve without additional intervention
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
ISMB-ECCB 2021, NIH/ODSS Session, 27 July 2021
ELIXIR is the pan-national European Research Infrastructure for Life Science data, whose 23 national nodes and the EBI coordinate the development and long-term sustainability of domain public databases. FAIR services, policies and curation approaches aim to build a FAIR connected data ecosystem of trusted domain repositories, from ENA, HPA and EGA to specialised resources like CorkOakDB and PIPPA for plant phenotypes. But this is only one part of the data landscape and often the end of data’s journey. The nodes support research projects to operate “FAIR data first”, working with institutional and national platforms that are often generic or designed for project-based data management. We need to bridge between project-based and community-based, and support researchers across their whole RDM lifecycle, navigating the complexity this ecosystem. The ELIXIR-CONVERGE project and its flagship RDMkit toolkit (https://rdmkit.elixir-europe.org) aims to do just that.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
Science is knowledge work. The scientific method and scholarly communication are about facilitating “knowledge turns” – that is, the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR flow and availability of data, methods for automated processing, reproducible results and on a society of scientists coordinating and collaborating. We need to build a new form of Research Commons and I will present my steps towards this.
Presented at Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018 that accompanied the 42nd Dies Natalis where I was awarded an honorary doctorate
Personal video:
https://www.youtube.com/watch?v=k5WN6KDDatU&index=4&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Video of the symposium:
https://www.youtube.com/watch?v=JN9eMMtCHf8&t=19s&index=6&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
Presented at the Leiden Bioscience Lecture, 24 November 2016, Reproducibility, Research Objects and Reality
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. It all sounds very laudable and straightforward. BUT…..
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange
In this talk I will explore these issues in data-driven computational life sciences through the examples and stories from initiatives I am involved, and Leiden is involved in too including:
· FAIRDOM which has built a Commons for Systems and Synthetic Biology projects, with an emphasis on standards smuggled in by stealth and efforts to affecting sharing practices using behavioural interventions
· ELIXIR, the EU Research Data Infrastructure, and its efforts to exchange workflows
· Bioschemas.org, an ELIXIR-NIH-Google effort to support the finding of assets.
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Reproducible Research: how could Research Objects helpCarole Goble
Reproducible Research: how could Research Objects help, given at 21st Genomic Standards Consortium Meeting
Dates: May 20-23, 2019
https://press3.mcs.anl.gov/gensc/meetings/gsc21/
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsCarole Goble
Abstract
slides available at: https://zenodo.org/record/7147703#.Y7agoxXP2F4
The Helmholtz Metadata Collaboration aims to make the research data [and software] produced by Helmholtz Centres FAIR for their own and the wider science community by means of metadata enrichment [1]. Why metadata enrichment and why FAIR? Because the whole scientific enterprise depends on a cycle of finding, exchanging, understanding, validating, reproducing), integrating and reusing research entities across a dispersed community of researchers.
Metadata is not just “a love note to the future” [2], it is a love note to today’s collaborators and peers. Moreover, a FAIR Commons must cater for the metadata of all the entities of research – data, software, workflows, protocols, instruments, geo-spatial locations, specimens, samples, people (well as traditional articles) – and their interconnectivity. That is a lot of metadata love notes to manage, bundle up and move around. Notes written in different languages at different times by different folks, produced and hosted by different platforms, yet referring to each other, and building an integrated picture of a multi-part and multi-party investigation. We need a crate!
RO-Crate [3] is an open, community-driven, and lightweight approach to packaging research entities along with their metadata in a machine-readable manner. Following key principles - “just enough” and “developer and legacy friendliness - RO-Crate simplifies the process of making research outputs FAIR while also enhancing research reproducibility and citability. As a self-describing and unbounded “metadata middleware” framework RO-Crate shows that a little bit of packaging goes a long way to realise the goals of FAIR Digital Objects (FDO)[4], and to not just overcome platform diversity but celebrate it while retaining investigation contextual integrity.
In this talk I will present the why, and how Research Object packaging eases Metadata Collaboration using examples in big data and mixed object exchange, mixed object archiving and publishing, mass citation, and reproducibility. Some examples come from the HMC, others from EOSC, USA and Australia, and from different disciplines.
Metadata is a love note to the future, RO-Crate is the delivery package.
[1] https://helmholtz-metadaten.de/en
[2] Scott, Jason The Metadata Mania, http://ascii.textfiles.com/archives/3181, June 2011
[3] Soiland-Reyes, Stian et al. “Packaging Research Artefacts with RO-Crate”. Data Science, 2022; 5(2):97-138, DOI: 10.3233/DS-210053
[4] De Smedt K, Koureas D, Wittenburg P. “FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units”. Publications. 2020; 8(2):21. https://doi.org/10.3390/publications8020021
FAIR Data and Model Management for Systems Biology(and SOPs too!)Carole Goble
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
FAIR Data and model management for Systems Biology
Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
Thoughts on computational science reproducibility with a focus on software. Given at the Software Sustainability Institute's 2014 Collaborations Workshop
Video: https://www.youtube.com/watch?v=Rt2oHibJT4k
Technologies such as Hadoop have addressed the "Volume" problem of Big Data, and technologies such as Spark have recently addressed the "Velocity" problem – but the "Variety" problem is largely unaddressed – there is a lot of manual "data wrangling" to mange data models.
These manual processes do not scale well. Not only is the variety of data increasing, also the rate of change in the data definitions is increasing. We can’t keep up. NoSQL data repositories can handle storage, but we need effective models of the data to fully utilize it.
This talk will present tools and a methodology to manage Big Data Models in a rapidly changing world. This talk covers:
Creating Semantic Metadata Models of Big Data Resources
Graphical UI Tools for Big Data Models
Tools to synchronize Big Data Models and Application Code
Using NoSQL Databases, such as Amazon DynamoDB, with Big Data Models
Using Big Data Models with Hadoop, Storm, Spark, Giraph, and Inference
Using Big Data Models with Machine Learning to generate Predictive Models
Developer Collaborative/Coordination processes using Big Data Models and Git
Managing change – Big Data Models with rapidly changing Data Resources
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
I’ll present our practical experiences of the why, what and how of Research Objects.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Robin Dale, RLG. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
This presentation was provided by David Kuliman of Elsevier, during the NISO event "Content Presentation: Diversity of Formats." The webinar was held on February 10, 2021.
BioCatalogue talk by Carole Goble. She outlines in these slides the reasons behind the BioCatalogue project. And present the BioCatalogue and its goals.
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
Presented at the FAIR Data in Practice Symposium, 16 may 2023 at BioITWorld Boston. https://www.bio-itworldexpo.com/fair-data. The ELIXIR European research Infrastructure for life science data is an inter-governmental organizations coordinating, integrating and sustaining FAIR data and software resources across its 23 nations. To help advise users, data stewards, project managers and service providers, ELIXIR has developed complementary community-driven, open knowledge resources for guiding FAIR Research Data Management (RDMkit) and providing FAIRification recipes (FAIRCookbook). 150+ people have contributed content so far, including representatives of the pharmaceutical industry.
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
Invited talk, PHIL_OS, March 30-31 2023, Exeter
https://opensciencestudies.eu/whither-open-science. Includes hidden slides.
FAIR and Open Science needs Digital Research Infrastructure, which is a federated system of systems and needs funding models that are fit for purpose
Culture change needed for paying for Open Science’s infrastructure and funding support for data driven research needs more reality and less rhetoric
Research Software Sustainability takes a VillageCarole Goble
The Research Software Alliance (ReSA) and the Netherlands eScience Center hosted a two-day international workshop to set the future agenda for national and international funders to support sustainable research software.
As the importance of software in research has become increasingly apparent, so has the urgent need to sustain it. Funders can play a crucial role in this respect by ensuring structural support. Over the past few years, a variety of methods for sustaining research software have been explored, including improving and extending funding policies and instruments. During the workshop, funding organizations joined forces to explore how they can effectively contribute to making research software sustainable.
This keynote helped frame the discussion from the perspective of community involvement in research software sustainability.
https://future-of-research-software.org/
this talk is available at Goble, Carole. (2022, November 8). Research Software Sustainability takes a Village. International funders workshop, The Future of Research Software, Amsterdam, The Netherlands. Zenodo. https://doi.org/10.5281/zenodo.7304596
“Bioscience has emerged as a data-rich discipline, in a transformation that is spreading as widely now as molecular biology in the twentieth century. We look forward to supporting new research careers, where data are valued and shared widely, where new software is a natural part of Biology, and where re-analysis and modelling are as creative as experimentation in understanding the rules of life and their applications.” Prof Andrew Millar FRS, chair Expert Group UKRI-BBSRC Review of data-intensive bioscience 2020.
Indeed - biomedical science is knowledge work and knowledge turning - the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR and Open flow and availability of data and methods for automated processing and reproducible results, and on a society of scientists coordinating and collaborating.
For the past 25 years I have worked on the social and technical challenges in digital infrastructure to support scientific collaboration, data and method sharing, and automate scientific processing. Big ideas I have been instrumental in – sharing and publishing high quality computational workflows, semantic web technologies in bioscience, ecosystems of Research Objects as the currency of scholarly knowledge, FAIR data principles - preached revolution to inspire but need nudges* to get traction.
I’ll talk about making good on Andrew’s quote: what I’m doing to nudge and where we need to do more. I’ll also talk about my experiences as a woman in a digital infrastructure and computer science over the past 40 years – and some nudging is needed there too.
*Thaler RH, Sunstein CR (2008) Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press. ISBN 978-0-14-311526-7. OCLC 791403664.
https://www.bsc.es/research-and-development/research-seminars/hybrid-bsc-rslife-sessionbioinfo4women-seminar-love-money-fame-nudge-enabling-data-intensive
Open Research: Manchester leading and learningCarole Goble
Open and FAIR science has an international momentum. Large scale communities are striving to make and manage the digital infrastructure needed for scientists to be open as possible, closed as necessary, as expected by the NIH, OECD, UNESCO and the EC. ELIXIR is such a research infrastructure in Europe for Life Sciences. This talk will highlight two of ELIXIR's Open Science resources built by Open Science communities to enable life science researchers to be open, and led by Manchester. And how can we learn from these and bring these practices to Manchester?
Launch: Manchester Office for Open Research, 4th April 2022
https://www.openresearch.manchester.ac.uk/
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
Keynote presented at the workshop FAIRe Data Infrastructures, 15 October 2020
https://www.gmds.de/aktivitaeten/medizinische-informatik/projektgruppenseiten/faire-dateninfrastrukturen-fuer-die-biomedizinische-informatik/workshop-2020/
Remarkably it was only in 2016 that the ‘FAIR Guiding Principles for scientific data management and stewardship’ appeared in Scientific Data. The paper was intended to launch a dialogue within the research and policy communities: to start a journey to wider accessibility and reusability of data and prepare for automation-readiness by supporting findability, accessibility, interoperability and reusability for machines. Many of the authors (including myself) came from biomedical and associated communities. The paper succeeded in its aim, at least at the policy, enterprise and professional data infrastructure level. Whether FAIR has impacted the researcher at the bench or bedside is open to doubt. It certainly inspired a great deal of activity, many projects, a lot of positioning of interests and raised awareness. COVID has injected impetus and urgency to the FAIR cause (good) and also highlighted its politicisation (not so good).
In this talk I’ll make some personal reflections on how we are faring with FAIR: as one of the original principles authors; as a participant in many current FAIR initiatives (particularly in the biomedical sector and for research objects other than data) and as a veteran of FAIR before we had the principles.
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
Plenary Lecture Presented at INCF Neuroinformatics 2019 https://www.neuroinformatics2019.org
Title: FAIRy stories: tales from building the FAIR Research Commons
Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any kind of Research Object is a mantra; a method; a meme; a myth; a mystery. For the past 15 years I have been working on FAIR in a range of projects and initiatives in the Life Sciences as we try to build the FAIR Research Commons. Some are top-down like the European Research Infrastructures ELIXIR, ISBE and IBISBA, and the NIH Data Commons. Some are bottom-up, supporting FAIR for investigator-led projects (FAIRDOM), biodiversity analytics (BioVel), and FAIR drug discovery (Open PHACTS, FAIRplus). Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. There are villains and heroes. Some have happy endings; all have morals.
COMBINE 2019, EU-STANDS4PM, Heidelberg, Germany 18 July 2019
FAIR: Findable Accessable Interoperable Reusable. The “FAIR Principles” for research data, software, computational workflows, scripts, or any other kind of Research Object one can think of, is now a mantra; a method; a meme; a myth; a mystery. FAIR is about supporting and tracking the flow and availability of data across research organisations and the portability and sustainability of processing methods to enable transparent and reproducible results. All this is within the context of a bottom up society of collaborating (or burdened?) scientists, a top down collective of compliance-focused funders and policy makers and an in-the-middle posse of e-infrastructure providers.
Making the FAIR principles a reality is tricky. They are aspirations not standards. They are multi-dimensional and dependent on context such as the sensitivity and availability of the data and methods. We already see a jungle of projects, initiatives and programmes wrestling with the challenges. FAIR efforts have particularly focused on the “last mile” – “FAIRifying” destination community archive repositories and measuring their “compliance” to FAIR metrics (or less controversially “indicators”). But what about FAIR at the first mile, at source and how do we help Alice and Bob with their (secure) data management? If we tackle the FAIR first and last mile, what about the FAIR middle? What about FAIR beyond just data – like exchanging and reusing pipelines for precision medicine?
Since 2008 the FAIRDOM collaboration [1] has worked on FAIR asset management and the development of a FAIR asset Commons for multi-partner researcher projects [2], initially in the Systems Biology field. Since 2016 we have been working with the BioCompute Object Partnership [3] on standardising computational records of HTS precision medicine pipelines.
So, using our FAIRDOM and BioCompute Object binoculars let’s go on a FAIR safari! Let’s peruse the ecosystem, observe the different herds and reflect what where we are for FAIR personalised medicine.
References
[1] http://www.fair-dom.org
[2] http://www.fairdomhub.org
[3] http://www.biocomputeobject.org
Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble
Talk given at the School of Computer Science, The University of Manchester, UK Postgraduate Research Symposium 2019
the Carole Goble Doctoral Paper award was given for the first time
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
2. Research Object
Community Update
Carole Goble, Stian Soiland-Reyes, Sean Bechhofer
The University of Manchester, UK
carole.goble@manchester.ac.uk
RO2018, 29 October 2018, Amsterdam, 2018.
Satellite workshop of IEEE 14th International Conference on e-Science 2018
3. 2010 – Research Objects not PDFs
Research has many
components, of
many types
Exchange of all the components of
an investigation
Computational instruments break or
need to be maintained
Science, and its products, evolve
5. Research Object Framework
Bechhofer et al (2013) https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) https://eprints.soton.ac.uk/268555/
carry machine processable
metadata common and specific
to different object types
bundle together and relate digital resources
with their context
snapshot, cite,
exchange
Standards-based generic metadata framework
Data used and results produced
Methods used to produce /analyse that data
Provenance and settings, People involved,
Annotations understanding & interpretation
6. 6
Howard Ratner,
Chair STM Future Labs Committee, CEO EVP Nature Publishing Group
Director of Development for CHORUS (Clearinghouse for the Open Research of US) STM
Innovations Seminar 2012
http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5
7. Container
“Unbounded” Objects
Bags of things and external references to things
A Digital Package Object
Type composed of many
interrelated elements that
bundles together and
relates digital resources of
a scientific investigation
with context.
A Metadata Object
that represents
properties in common
across all research
artefacts types,
common PIDs and
metadata
9. Workflow RO
Describe and run workflows, and
the command line tools they
orchestrate, supporting
containers to be portable,
transparent and interoperable .
Describe the workflow inputs,
outputs, tools and data with
controlled vocabularies /
ontologies
EDAM
Describe the provenance of
the workflow
Software components are
containerised to be portable
Workflow systems run the
CWL workflow
Gather the CWL
workflow
descriptions + rich
context, provenance
using multi-tiered
descriptions
Snapshot workflow.
Relate it to other
objects.
Archive formats
to contain the
objectContainer
Metadata
https://www.commonwl.org/
17. Container Profiles
Specification for a structured ZIP-file, based on the ePub and Adobe
UCF specifications
Research Object Bundle 1.0
https://researchobject.github.io/specifications/bundle/
Specifies a file system structure for transferring and
archiving a collection of files, including their checksums to
verify and validate content and brief metadata.
https://github.com/ResearchObject/bagit-ro
mechanism for serialization
and transport consistency,
capture identity, annotations and
provenance of the resources
Big Data
collections
of arbitrary
referenced
content
https://github.com/fair-research/bdbag
18. Manifest Profile Description
general construction & validation tooling
Linked Data and RDF Shapes
Validate graph-based data against a set
of conditions
Shapes Constraint Language
Gamble,Zhao, Klyne,Goble. IEEE eScience 2012,
http://dx.doi.org/10.1109/eScience.2012.6404489
Minim model for defining checklists
ro-show
• RO pre-processing to merge to
single graph
• RDF Shape that indicates to
follow links
• Bespoke validators / unpackers
to iterate over the RO
[Lilian Gorea,Oluwatomide Fasugba 2018]
19. ResearchObject drivers
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1
Exchange & Commons
Preservation and fixed point publishing
Reproducibility and execution
Active “release” research
24. Acknowledgements
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Mark Robinson
AlanWilliams
Norman Morrison
Stian Soiland-Reyes
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
KristianGarza
Daniel Garijo
Catarina Martins
Iain Buchan
Michael Crusoe
Rob Finn
Carl Kesselman
Ian Foster
Kyle Chard
Vahan Simonyan
Ravi Madduri
Raja Mazumder
GilAlterovitz,
Denis Dean II
Durga Addepalli
Wouter Haak
Anita De Waard
Paul Groth
Oscar Corcho
Josh Sommer
Project ID: 675728
Editor's Notes
Nested content
Heterogeneous elements.
Distributed and embedded content.
Externally stewarded content.
Checklists + Checksums
Standards-based generic
metadata framework
A unit for exchanges and turning….
Validation, Verification
Provenance
Dependencies Versions
Checklists Variance
Portability
Transparent Processes
Bechhofer et al (2013) Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
Fixed and Living
This is what we are turning
More than Data, More than one store
Lets make this more concrete before we go into the HOWStandards-based metadata framework for logically and physically bundling resources with context, http://researchobject.orgBigger on the inside than the outside - external referencing
Research Object FrameworkStandards-based metadata framework for logically and physically bundling resources with context
In Seven Bridges Language
The abstract one is the task (blue)
The results as well (green) is the analysis
Studies are lots of analysis
Community led standard way of expressing and running workflows and the command line tools they orchestrate, supporting containers for portability.
JSON-LD
Predated the FAIR Principles
Element enumeration
Identification & citation
Description tracking attributes (metadata) and origins (provenance) of contents.
Simplicity - low user overhead and thin (no) client
Research Objects are designed to: be tailored to be domain or type specific; work at many levels of granularity, with their own identifiers, citation metadata; and be snapshots or living to suit their place in the research lifecycle.
Data used and results produced …
Methods employed to produce and analyse that data …
Provenance and settings …
People involved …
Annotations understanding & interpretation …
Describe how to describe
It’s a scaffold
In the presentation we outline features of the latest specifications (https://w3id.org/ro/2016-01-28)
Under the hood building blocks
BagIt is an Internet Draft that specifies a file system structure for transferring and archiving a collection of files, including their checksums and brief metadata.
Research Object bundles is a specification for a structured ZIP-file, based on the ePub and Adobe UCF specifications. The bundle serializes a Research Object, embedding some or all of its resources within the ZIP file, and list the RO content in a manifest, in addition to embedding and referencing annotations and provenance.
A BagIt bag can be considered a mechanism for serialization and transport consistency, while a Research Object can be considered a way to capture identity, annotations and provenance of the resources. As such, the two formats complement each-other. They are however not directly compatible.
This GitHub repository explains by example a profile for a BagIt bag to also be a Research Object.
https://www.slideshare.net/jelabra/shex-vs-shacl
ShEx is schema based
SHACL is constraint based
encoding manifest content profiles using Linked Data and RDF Shapes in order to support general validation tools (https://github.com/researchobject/ro-show)
validating graph-based data against a set of conditions. Among others, SHACL includes features to express conditions that constrain the number of values that a property may have, the type of such values, numeric ranges, string matching patterns, and logical combinations of such constraints. SHACL also includes an extension mechanism to express more complex conditions in languages such as SPARQL. A SHACL validation engine takes as input a data graph and a graph containing shapes declarations and produces a validation report that can be consumed by tools. All these graphs can be represented in any Resource Description Framework (RDF) serialization formats including JSON-LD or Turtle. The adoption of SHACL may influence the future of linked data.[2]
Exchange & Commons: The transfer of knowledge, data and results between the services and actors and the development of RO Commons to enable reuse and sharing. References to remote content allows for access management due to privacy restrictions or data scales.
Commons & Catalogues
Publishing,
Exchange between people and platforms
Sharing,
Training
Preservation: Snapshots of state of a collection of resources for fixed-point publishing.
Conservation
Repair
Archive
Reproducibility: Describing the structured collection, its components and its context in a rich enough way to support inspection and interpretation by people, and re-execution and comparison by computational machinery.
Execution
Replication
Reproducibility
Preservation
Portability
Active “release” oriented research: Accumulating metadata to reflect versions, new configurations of content, evolutions, relationships between objects, and metadata reflecting who has added to the object or used it.
Active Research
Release
Evolution
Snapshots
Remixing,
Comparison,
Review
Automated processing
RO Composer
Nested content
Heterogeneous elements.
Distributed and embedded content.
Externally stewarded content.
Checklists + Checksums
Community forking & fragmentation
Arose from workflow sharing and preservation
Atomicity
Granularity
Aggregation
Composition, Fragmentation
Versioning
Forking
Cloning
Portability
Dependency management