SlideShare a Scribd company logo

Being FAIR: Enabling Reproducible Data Science

Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session

1 of 46
Download to read offline
Being FAIR:
Enabling Reproducible
Data Science
Professor Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
2018 Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018
Disclosure
Knowledge management
Computational workflows
Sharing and exchange
Reproducibility
Large e-Infrastructure
projects for life science data
The Learning Health System
Phenotypic
Patient Records
Patient cohort building
Patient stratification
Case notes
Discharge notes
Patient cohorts
Patient Multi-omics
Public Reference
repositories
text mining, data mining
data & vocabulary linking
data analytics
Single cell omics
Clinical genomics
Quantitative biology
e-Health
Predictive
models
Sensors Diagnostics
Biomarkers
Imaging
Research Clinical
Biobanks
Scientific
Literature
Patient
Public Health
[Friedman]
An Inspiration
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
Josh Sommer
http://www.chordomafoundation.org/
Accelerate a cure
Accelerate knowledge
exchange
Barriers to Cure
• Access to scientific resources
• Coordination,Collaboration
• Flow of Information
• FAIR Data, FAIR Methods
• FAIR Object Commons
[Josh Sommer]
GobleC., De Roure D., Bechhofer S. (2013) AcceleratingScientists’ KnowledgeTurns, https://doi.org/10.1007/978-3-642-37186-8_1
Research Commons
Accelerate inter-lab
knowledge turns
Accumulate knowledge
1. A Research Commons
“… a “cloud-based” platform where investigators can store, share, access, and interact
with digital objects (data, software, [models, SOPs], etc.) generated from …. research.
By connecting the digital objects and making them accessible, the Data Commons is
intended to allow novel scientific research that was not possible before, including
hypothesis generation, discovery, and validation.” https://commonfund.nih.gov/commons
Pooled Resources
Federated
Find andAccess
Many entry points
Data + Methods + Models

Recommended

Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 

More Related Content

What's hot

Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.orgNorman Morrison
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMCarole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 

What's hot (20)

Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
ROHub
ROHubROHub
ROHub
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 

Similar to Being FAIR: Enabling Reproducible Data Science

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...European School of Oncology
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckPistoia Alliance
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesTom Plasterer
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformLaura Clarke
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 

Similar to Being FAIR: Enabling Reproducible Data Science (20)

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 

More from Carole Goble

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a VillageCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learningCarole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 

More from Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 

Recently uploaded

An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesDavid Yonge-Mallo
 
Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Naresh Gupta
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Nan Yang Academy of Sciences
 
Construction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfConstruction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfLossian Barbosa Bacelar Miranda
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxNavanidhan.M
 
REARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxREARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxVISHALI SELVAM
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET
 
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxExploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxSamrat Tayade
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typeseshasmalik27
 
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...Advanced-Concepts-Team
 
Quality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxQuality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxDeviSky1
 
Hypertension in Children and Adolescents
Hypertension in Children and AdolescentsHypertension in Children and Adolescents
Hypertension in Children and AdolescentsTristanBabaylan1
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English BrochureInsilico Gen
 
Chemistry chapter 1 solutions detailed explanation
Chemistry chapter 1 solutions detailed explanationChemistry chapter 1 solutions detailed explanation
Chemistry chapter 1 solutions detailed explanationayuqroyjohn85
 
Quasar and Microquasar Series - Microquasars in our Galaxy
Quasar and Microquasar Series - Microquasars in our GalaxyQuasar and Microquasar Series - Microquasars in our Galaxy
Quasar and Microquasar Series - Microquasars in our GalaxySérgio Sacani
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...AmalDhivaharS
 
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NinePlaneta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NineSérgio Sacani
 

Recently uploaded (20)

An Introduction to Quantum Programming Languages
An Introduction to Quantum Programming LanguagesAn Introduction to Quantum Programming Languages
An Introduction to Quantum Programming Languages
 
Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.Salesforce Starter Package Presentation.
Salesforce Starter Package Presentation.
 
INTRODUCTION TO PLANT TAXONOMY WITH DIVERSE TAXONOMIC APPROACHES
INTRODUCTION TO PLANT TAXONOMY WITH DIVERSE TAXONOMIC APPROACHESINTRODUCTION TO PLANT TAXONOMY WITH DIVERSE TAXONOMIC APPROACHES
INTRODUCTION TO PLANT TAXONOMY WITH DIVERSE TAXONOMIC APPROACHES
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
Earth and Planetary Science | Volume 01 | Issue 01 | April 2022
 
Construction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdfConstruction of Magic Squares by Swapping Rows and Columns.pdf
Construction of Magic Squares by Swapping Rows and Columns.pdf
 
Research methods in ethnobotany- Exploring Traditional Wisdom
Research methods in ethnobotany- Exploring Traditional WisdomResearch methods in ethnobotany- Exploring Traditional Wisdom
Research methods in ethnobotany- Exploring Traditional Wisdom
 
Duchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptxDuchenne Muscular Dystrophy or DMD .pptx
Duchenne Muscular Dystrophy or DMD .pptx
 
REARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptxREARING EQUIPMENT IN SERICULTURE . pptx
REARING EQUIPMENT IN SERICULTURE . pptx
 
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
dkNET Webinar: An Encyclopedia of the Adipose Tissue Secretome to Identify Me...
 
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptxExploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
Exploring Artificial Intelligence_ Revolutionizing Tomorrow's World.pptx
 
PROSTHETIC FEET description and its types
PROSTHETIC FEET description and its typesPROSTHETIC FEET description and its types
PROSTHETIC FEET description and its types
 
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...The ExoGRAVITY project - observations of exoplanets from the ground with opti...
The ExoGRAVITY project - observations of exoplanets from the ground with opti...
 
Quality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptxQuality safety and legislations of cosmetics.pptx
Quality safety and legislations of cosmetics.pptx
 
Hypertension in Children and Adolescents
Hypertension in Children and AdolescentsHypertension in Children and Adolescents
Hypertension in Children and Adolescents
 
2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure2024 Insilicogen Company English Brochure
2024 Insilicogen Company English Brochure
 
Chemistry chapter 1 solutions detailed explanation
Chemistry chapter 1 solutions detailed explanationChemistry chapter 1 solutions detailed explanation
Chemistry chapter 1 solutions detailed explanation
 
Quasar and Microquasar Series - Microquasars in our Galaxy
Quasar and Microquasar Series - Microquasars in our GalaxyQuasar and Microquasar Series - Microquasars in our Galaxy
Quasar and Microquasar Series - Microquasars in our Galaxy
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
 
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet NinePlaneta 9 - A Pan-STARRS1 Search for Planet Nine
Planeta 9 - A Pan-STARRS1 Search for Planet Nine
 

Being FAIR: Enabling Reproducible Data Science

  • 1. Being FAIR: Enabling Reproducible Data Science Professor Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk 2018 Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018
  • 2. Disclosure Knowledge management Computational workflows Sharing and exchange Reproducibility Large e-Infrastructure projects for life science data
  • 3. The Learning Health System Phenotypic Patient Records Patient cohort building Patient stratification Case notes Discharge notes Patient cohorts Patient Multi-omics Public Reference repositories text mining, data mining data & vocabulary linking data analytics Single cell omics Clinical genomics Quantitative biology e-Health Predictive models Sensors Diagnostics Biomarkers Imaging Research Clinical Biobanks Scientific Literature Patient Public Health [Friedman]
  • 5. Barriers to Cure • Access to scientific resources • Coordination,Collaboration • Flow of Information • FAIR Data, FAIR Methods • FAIR Object Commons [Josh Sommer] GobleC., De Roure D., Bechhofer S. (2013) AcceleratingScientists’ KnowledgeTurns, https://doi.org/10.1007/978-3-642-37186-8_1 Research Commons Accelerate inter-lab knowledge turns Accumulate knowledge
  • 6. 1. A Research Commons “… a “cloud-based” platform where investigators can store, share, access, and interact with digital objects (data, software, [models, SOPs], etc.) generated from …. research. By connecting the digital objects and making them accessible, the Data Commons is intended to allow novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.” https://commonfund.nih.gov/commons Pooled Resources Federated Find andAccess Many entry points Data + Methods + Models
  • 7. Clear steps Transparent Comprehensible Replicable Logged Accessible Provenance Standardised Harmonised Combined Method Materials Variations X N Repeat. Compare. Log & Track Provenance Scale 2. Data-driven Science, Predictive Science is Software-driven, Method-Driven
  • 8. 3. Reuse and Reproducibility Is hard for in vivo/vitro and even for in silico analysis • OS version • Revision of scripts • Data analysis software versions • Version of data files • Command line parameters written on a napkin • “Magic” the grad student knows…. [Keiichiro Ono, Scripps Institute]
  • 9. Findable (Citable) Accessible (Trackable) Interoperable (Intelligible) Reusable (Reproducible) Record Automate Contain Access
  • 10. FAIR provenance portability preservation robustness access description standards, common APIs licensing standards, common metadata versioning, deviation variation sensitivity discrepancy handling parametric spaces packaging, containers dependencies steps ids Reproduce and reuse computations Transparently communicate the way computations are performed Disambiguate interpretation of inputs/parameters/results Safely (re)run computations ported onto different platforms Human and computer readable definitions for the provenance of computation, types for the data and results
  • 11. Cancer Data Integrator [Várna,Davies, NIHR Health Informatics Collaborative, UK]
  • 13. Objects: data + methods + models + provenance + Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata - Bundle files - Keep provenance - Exchange data - Ship results Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1. Combine Archive Systems Biology Systems Medicine https://sems.unirostock.de/projects/combinearchive/
  • 14. Research Object Framework Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ carry machine processable metadata in common and specific to different object types. bundle together and relate digital resources with their context into a unit. snapshot, cite, exchange run, evolve accumulate interlink Standards-based generic metadata framework
  • 15. Container Metadata Object metadata, ontologies, identifiers “Unbounded” Objects Bags of things and external references to things Data used and results produced … Methods employed to produce and analyse that data … Provenance and settings … People involved … Annotations understanding & interpretation …
  • 16. • Co-localizing massive genomics datasets, like The Cancer Genomics Atlas, alongside secure and scalable computational resources to analyze them. • Analyze own data alongside TCGA using predefined analytical workflows or your own tools. • Petabyte of multi- dimensional data available to authorized researchers. • Fully reproducible execution • Secure team collaboration. http://www.cancergenomicscloud.org/ NCI Cancer Genomics Cloud (CGC) Pilot
  • 17. HTS pipelines for precision medicine GATK:Tumor-Normal Paired Exome-Sequencing pipeline [Durga Addepalli, Seven Bridges]
  • 18. HTS pipelines for precision medicine GATK:Tumor-Normal Paired Exome-Sequencing pipeline [Durga Addepalli, Seven Bridges] Inputs OutputsAnalysis
  • 19. Workflow Input Data (Files) Output Data (Files) Software Component Settings (Annotation) Workflow is defined usingCommonWorkflow Language (CWL) Software components are Docker images http://www.cancergenomicscloud.org/ Analysis
  • 20. Output FilesInput Files Intermediates Parameters Configurations Workflow Run Provenance Narrative ExecutionWorkflow Engine Tools / Codes Resources Author Workflow Container Metadata Analysis
  • 21. Parameters Configurations Workflow Provenance Workflow Engine Algorithms, Pipelines Definitions of the Metadata Instances Data files Computation metadata Tools / Codes metadata Biocompute workflow Data formats Ontologies Data files Results Container Stratified, Shareable Objects Scientifically reliable interpretation Verifiable results within acceptable uncertainty/error Comparable results
  • 22. Parameters Configurations Workflow Provenance Workflow Engine Algorithms, Pipelines Definitions of the Metadata Instances Data files Computation metadata Tools / Codes metadata Biocompute workflow Data formats Ontologies Data files Results Container Biocontainers bio.tools CWLViewer
  • 23. Open standards, commodity systems Describe and run workflows, and the command line tools they orchestrate, supporting containers to be portable, transparent and interoperable . Describe the workflow inputs, outputs, tools and data with controlled vocabularies / ontologies EDAM Describe the provenance of the workflow Software components are containerised to be portable Workflow systems run the CWL workflow Gathers the CWL workflow descriptions together with rich context and provenance using multi-tiered descriptions Snapshots the workflow. Relates it to other objects. Uses archive formats to contain the object
  • 26. FAIR Methods, different workflow systems & clouds Living Products
  • 27. https://osf.io/h59uh/ Personalized medicine regulation Standardize exchange of HTS workflows for regulatory submissions between FDA, pharma, bioinformatics platform providers and researchers Inspect and replicate the computational analytical workflow to review and approve the bioinformatics Domain-specific object model captures essential information without going in details of the actual execution. A community-driven project Emphasis on robust, safe reuse Technical Reproducibility packaging software and providing required datasets Human understanding of what has been done higher level steps of the workflow, their parameter spaces and algorithm settings Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
  • 28. analysis and review sample archival sequencing run file transfer regulation computation pipelines produced files are massive in size transfer is slow too large to keep forever; not standardized difficult to validate/verify how can industry and FDA work together to avoid mistakes? HTS lifecycle: from a biological sample to biomedical research and regulation [Vahan Simonyan] FDA BAA contract HHSF223201510129C (PI: Raja Mazumder)
  • 31. BioCompute Framework to advance Regulatory Science to support NGS analysis Emphasis on robust, safe reuse. Describe and validate the metadata of packages, and their contents, both inside and outside Standardise data formats and elements and exchange of Electronic Health Records Describe and validate analysis workflows, to be portable and interoperable Standardise and support sharing and analysis of Genomic data Ontologies Controlled vocabularies for describing all of the above APIs Programmable interfaces for accessing all of the above Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
  • 33. Living Objectisms: grow, evolve, mutate • RO life cycles – Fixed snapshot – Living objects – Rot, mutate, clone • Arose from workflow sharing and preservation • Research Objects are analogous to software artefacts and practices rather than data or articles Snapshot Fork Combine
  • 34. Validate Container Manifest Profile Descriptions what else is needed Dependencies Versioning its evolution what should be there Checklists Provenance where it came from ids metadata that describes Research Object general purpose to drive scalable infrastructure
  • 35. All Type Specific Implementation specific Container Manifest Profile Descriptions what else is needed Dependencies Versioning its evolution what should be there Checklists Provenance where it came from ids metadata that describes Research Object
  • 36. Container Profile Under the hood building blocks: metadata that describes metadata general purpose to drive scalable infrastructure Manifest Construction Profile Construction IDENTIFIER
  • 37. Many other kinds of objects Multiple object types in an investigation Structured collections of objects Physical objects, SOPs These examples wereWorkflow Objects… [Sansone] Asthma Research e-Lab [Phil Crouch, John Ainsworth, Iain Buchan]
  • 38. Chard et al: I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618 Dnase HypersensitivityAnalysis using ENCODE (Encyclopedia of DNA Elements ) access, analysis and publishing using Galaxy images and genome sequences assembled from diverse repositories data distributed across multiple locations, referenced because big and persisted, efficiently and safely moved on demand Assemble and share large scale, multi-element datasets. [Chard, Kesselman, Foster, Madduri, 2016]
  • 39. Richly structured descriptions of content in the bag and outside it Transfer and archive very large HTS datasets in a location- independent way. Secure referencing and moving of patient data. Big Data collections of arbitrary referenced content annotations, provenance, relations checksums Simple, location independent persistent identifiers Define a dataset and its contents by enumerating its elements, regardless of their location Verify and validate content
  • 40. FAIR Data Commons 3. Everything is a research object: all the (distributed) components of an investigation (models, data, pipelines, SOPs, provenance...) into citable, exchangeable, publishable, preserved, nested objects 1. Assemble and share large scale, multi- element datasets. Secure referencing and moving of patient data. 2. Reproduce, port, share, and execute HTS pipelines (and other analytics …)
  • 41. The Knowledge Object Reference Ontology (KORO): A formalism to support management and sharing of computable biomedical knowledge for learning health systems Flynn, Friedman, Boisvert, Landis‐Lewis, Lagoze (2018), https://doi.org/10.1002/lrh2.10054 Graphs of Research Objects Track Research Objects Combine and enrich Research Objects Learning Health Systems
  • 42. International Efforts: FAIR Life Science Data Infrastructure • EGA in a Box for storing, coordinating and distributing human data • Human Data Beacons discovery service • Authentication and Authorization Infrastructure Interoperability, Compute, Data, Tools,Training Tools andWorkflow collaboratory for EOSC https://www.elixir-europe.org/use-cases/human-data
  • 43. Summary: help knowledge turning • Data Science is underpinned by data access + transparent methods to enable reproducible and FAIR knowledge exchange. • FAIR First. • Research Objects as the currency of reproducibility and exchange • A bunch of tech, standards, tooling, best practices, grass roots and international activities going on. • Tech isn’t the issue. • e-Infrastructure matters. Please care about it.
  • 45. Melissa Haendel, PhD Director of Translational Data Science, Oregon State University Director of the Center for Data to Health, Oregon Health & Science University
  • 46. Acknowledgements Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Mark Robinson AlanWilliams Norman Morrison Stian Soiland-Reyes Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone KristianGarza Daniel Garijo Catarina Martins Iain Buchan Michael Crusoe Rob Finn Carl Kesselman Ian Foster Kyle Chard Vahan Simonyan Ravi Madduri Raja Mazumder GilAlterovitz, Denis Dean II Durga Addepalli Wouter Haak Anita De Waard Paul Groth Oscar Corcho Josh Sommer Project ID: 675728