SlideShare a Scribd company logo
Being FAIR:
Enabling Reproducible
Data Science
Professor Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
2018 Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018
Disclosure
Knowledge management
Computational workflows
Sharing and exchange
Reproducibility
Large e-Infrastructure
projects for life science data
The Learning Health System
Phenotypic
Patient Records
Patient cohort building
Patient stratification
Case notes
Discharge notes
Patient cohorts
Patient Multi-omics
Public Reference
repositories
text mining, data mining
data & vocabulary linking
data analytics
Single cell omics
Clinical genomics
Quantitative biology
e-Health
Predictive
models
Sensors Diagnostics
Biomarkers
Imaging
Research Clinical
Biobanks
Scientific
Literature
Patient
Public Health
[Friedman]
An Inspiration
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
Josh Sommer
http://www.chordomafoundation.org/
Accelerate a cure
Accelerate knowledge
exchange
Barriers to Cure
• Access to scientific resources
• Coordination,Collaboration
• Flow of Information
• FAIR Data, FAIR Methods
• FAIR Object Commons
[Josh Sommer]
GobleC., De Roure D., Bechhofer S. (2013) AcceleratingScientists’ KnowledgeTurns, https://doi.org/10.1007/978-3-642-37186-8_1
Research Commons
Accelerate inter-lab
knowledge turns
Accumulate knowledge
1. A Research Commons
“… a “cloud-based” platform where investigators can store, share, access, and interact
with digital objects (data, software, [models, SOPs], etc.) generated from …. research.
By connecting the digital objects and making them accessible, the Data Commons is
intended to allow novel scientific research that was not possible before, including
hypothesis generation, discovery, and validation.” https://commonfund.nih.gov/commons
Pooled Resources
Federated
Find andAccess
Many entry points
Data + Methods + Models
Clear steps
Transparent
Comprehensible
Replicable
Logged
Accessible
Provenance
Standardised
Harmonised
Combined
Method
Materials
Variations X N
Repeat. Compare.
Log & Track
Provenance
Scale
2. Data-driven Science, Predictive Science
is Software-driven, Method-Driven
3. Reuse and Reproducibility
Is hard for in vivo/vitro and
even for in silico analysis
• OS version
• Revision of scripts
• Data analysis software
versions
• Version of data files
• Command line parameters
written on a napkin
• “Magic” the grad student
knows….
[Keiichiro Ono, Scripps Institute]
Findable (Citable)
Accessible (Trackable)
Interoperable (Intelligible)
Reusable (Reproducible)
Record
Automate
Contain
Access
FAIR
provenance
portability
preservation
robustness
access
description
standards, common APIs
licensing
standards,
common metadata
versioning, deviation
variation sensitivity
discrepancy handling
parametric spaces
packaging, containers
dependencies
steps
ids
Reproduce and reuse
computations
Transparently communicate
the way computations are
performed
Disambiguate interpretation
of inputs/parameters/results
Safely (re)run computations
ported onto different
platforms
Human and computer
readable definitions for the
provenance of computation,
types for the data and results
Cancer Data Integrator
[Várna,Davies, NIHR Health Informatics Collaborative, UK]
Objects: data + methods + models + provenance +
Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata
- Bundle files - Keep provenance
- Exchange data - Ship results
Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information
to reproduce a modeling project. BMC bioinformatics,15(1), 1.
Combine Archive
Systems Biology
Systems Medicine
https://sems.unirostock.de/projects/combinearchive/
Research Object Framework
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
carry machine
processable metadata
in common and specific
to different object
types.
bundle together and
relate digital
resources with their
context into a unit.
snapshot, cite,
exchange
run, evolve
accumulate
interlink
Standards-based generic
metadata framework
Container
Metadata
Object
metadata, ontologies,
identifiers
“Unbounded” Objects
Bags of things and external references to things
Data used and results produced …
Methods employed to produce and
analyse that data …
Provenance and settings …
People involved …
Annotations understanding & interpretation …
• Co-localizing massive
genomics datasets, like The
Cancer Genomics Atlas,
alongside secure and
scalable computational
resources to analyze them.
• Analyze own data alongside
TCGA using predefined
analytical workflows or
your own tools.
• Petabyte of multi-
dimensional data available
to authorized researchers.
• Fully reproducible
execution
• Secure team collaboration.
http://www.cancergenomicscloud.org/
NCI Cancer Genomics Cloud (CGC) Pilot
HTS pipelines for precision medicine
GATK:Tumor-Normal Paired Exome-Sequencing pipeline
[Durga Addepalli, Seven Bridges]
HTS pipelines for precision medicine
GATK:Tumor-Normal Paired Exome-Sequencing pipeline
[Durga Addepalli, Seven Bridges]
Inputs OutputsAnalysis
Workflow
Input Data
(Files)
Output Data
(Files)
Software
Component
Settings
(Annotation)
Workflow is defined usingCommonWorkflow Language (CWL)
Software components are Docker images
http://www.cancergenomicscloud.org/
Analysis
Output FilesInput Files Intermediates
Parameters
Configurations
Workflow
Run
Provenance
Narrative
ExecutionWorkflow
Engine
Tools / Codes
Resources
Author Workflow
Container
Metadata
Analysis
Parameters
Configurations
Workflow
Provenance
Workflow
Engine
Algorithms,
Pipelines
Definitions
of the
Metadata
Instances
Data files
Computation
metadata
Tools / Codes
metadata
Biocompute
workflow
Data formats
Ontologies
Data files
Results
Container
Stratified,
Shareable
Objects
Scientifically reliable
interpretation
Verifiable results within
acceptable uncertainty/error
Comparable results
Parameters
Configurations
Workflow
Provenance
Workflow
Engine
Algorithms,
Pipelines
Definitions
of the
Metadata
Instances
Data files
Computation
metadata
Tools / Codes
metadata
Biocompute
workflow
Data formats
Ontologies
Data files
Results
Container
Biocontainers
bio.tools
CWLViewer
Open standards,
commodity systems
Describe and run workflows, and the
command line tools they orchestrate,
supporting containers to be portable,
transparent and interoperable .
Describe the workflow inputs,
outputs, tools and data with
controlled vocabularies / ontologies
EDAM
Describe the provenance of
the workflow
Software components are
containerised to be portable
Workflow systems run the CWL workflow
Gathers the CWL workflow descriptions
together with rich context and provenance
using multi-tiered descriptions
Snapshots the workflow.
Relates it to other objects.
Uses archive formats to contain the object
A community-driven project
https://www.commonwl.org/
https://view.commonwl.org/workflows/github.com/mnneveau/cancer-genomics-
workflow/blob/master/detect_variants/detect_variants.cwl
Manifest
CWL
Annotations
Under the hood
FAIR Methods, different workflow systems & clouds
Living
Products
https://osf.io/h59uh/
Personalized medicine regulation
Standardize exchange of HTS workflows for regulatory submissions between
FDA, pharma, bioinformatics platform providers and researchers
Inspect and replicate the computational analytical workflow to review and
approve the bioinformatics
Domain-specific object model captures essential information without going in
details of the actual execution.
A community-driven project
Emphasis on robust, safe reuse
Technical Reproducibility
packaging software and providing
required datasets
Human understanding of what has been done
higher level steps of the workflow, their
parameter spaces and algorithm settings
Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication
of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
analysis
and review
sample
archival
sequencing run
file transfer
regulation
computation
pipelines
produced files
are massive in
size
transfer is
slow
too large to keep
forever; not
standardized
difficult to
validate/verify
how can
industry and
FDA work
together to
avoid
mistakes?
HTS lifecycle: from a biological sample
to biomedical research and regulation
[Vahan Simonyan] FDA BAA contract HHSF223201510129C (PI: Raja Mazumder)
https://osf.io/h59uh/
https://doi.org/10.1101/191783
https://doi.org/10.1101/191783
identifiers.org
Under the hood
BioCompute Framework
to advance Regulatory Science to support NGS analysis
Emphasis on robust, safe reuse.
Describe and validate the
metadata of packages, and their
contents, both inside and outside
Standardise data formats and
elements and exchange of
Electronic Health Records
Describe and
validate analysis
workflows, to be
portable and
interoperable
Standardise and support
sharing and analysis of
Genomic data
Ontologies
Controlled vocabularies for
describing all of the above
APIs
Programmable interfaces for
accessing all of the above
Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication
of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
http://sites.ieee.org/sagroups-2791/
Standardisation
Living Objectisms: grow, evolve, mutate
• RO life cycles
– Fixed snapshot
– Living objects
– Rot, mutate, clone
• Arose from workflow
sharing and preservation
• Research Objects are
analogous to software
artefacts and practices
rather than data or
articles
Snapshot Fork
Combine
Validate
Container
Manifest Profile
Descriptions
what
else
is needed
Dependencies
Versioning
its evolution
what
should
be there
Checklists
Provenance
where it
came from
ids
metadata that describes Research Object
general purpose to drive scalable infrastructure
All
Type Specific
Implementation
specific
Container
Manifest Profile
Descriptions
what
else
is needed
Dependencies
Versioning
its evolution
what
should
be there
Checklists
Provenance
where it
came from
ids
metadata that describes Research Object
Container Profile
Under the hood building blocks:
metadata that describes metadata
general purpose to drive scalable infrastructure
Manifest
Construction
Profile
Construction
IDENTIFIER
Many other kinds of objects
Multiple object types in an
investigation
Structured collections of objects
Physical objects, SOPs
These examples wereWorkflow Objects…
[Sansone]
Asthma Research e-Lab
[Phil Crouch, John
Ainsworth, Iain Buchan]
Chard et al: I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618
Dnase HypersensitivityAnalysis
using ENCODE (Encyclopedia of
DNA Elements ) access, analysis
and publishing using Galaxy
images and
genome sequences
assembled from diverse
repositories
data distributed across
multiple locations,
referenced because big
and persisted, efficiently
and safely moved on
demand
Assemble and share large scale, multi-element
datasets.
[Chard, Kesselman, Foster, Madduri, 2016]
Richly structured
descriptions of content in the bag and outside it
Transfer and archive very large HTS datasets in a location-
independent way. Secure referencing and moving of patient data.
Big Data
collections of
arbitrary
referenced
content
annotations,
provenance,
relations
checksums
Simple, location independent
persistent identifiers
Define a dataset and its contents by enumerating its elements, regardless of their location
Verify and validate content
FAIR Data Commons
3. Everything is a research object: all
the (distributed) components of an
investigation (models, data,
pipelines, SOPs, provenance...) into
citable, exchangeable, publishable,
preserved, nested objects
1. Assemble and share
large scale, multi-
element datasets.
Secure referencing and
moving of patient data.
2. Reproduce, port,
share, and execute
HTS pipelines (and
other analytics …)
The Knowledge Object Reference Ontology (KORO): A formalism to support management and
sharing of computable biomedical knowledge for learning health systems
Flynn, Friedman, Boisvert, Landis‐Lewis, Lagoze (2018), https://doi.org/10.1002/lrh2.10054
Graphs of Research
Objects
Track Research
Objects
Combine and enrich
Research Objects
Learning Health Systems
International Efforts:
FAIR Life Science Data Infrastructure
• EGA in a Box for storing,
coordinating and distributing
human data
• Human Data Beacons discovery
service
• Authentication and
Authorization Infrastructure
Interoperability, Compute, Data,
Tools,Training
Tools andWorkflow collaboratory
for EOSC
https://www.elixir-europe.org/use-cases/human-data
Summary: help knowledge turning
• Data Science is underpinned by data
access + transparent methods to
enable reproducible and FAIR
knowledge exchange.
• FAIR First.
• Research Objects as the currency of
reproducibility and exchange
• A bunch of tech, standards, tooling,
best practices, grass roots and
international activities going on.
• Tech isn’t the issue.
• e-Infrastructure matters. Please care
about it.
http://www.researchobject.org/ro2018/
Melissa Haendel, PhD
Director of Translational Data Science, Oregon
State University
Director of the Center for Data to Health,
Oregon Health & Science University
Acknowledgements
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Mark Robinson
AlanWilliams
Norman Morrison
Stian Soiland-Reyes
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
KristianGarza
Daniel Garijo
Catarina Martins
Iain Buchan
Michael Crusoe
Rob Finn
Carl Kesselman
Ian Foster
Kyle Chard
Vahan Simonyan
Ravi Madduri
Raja Mazumder
GilAlterovitz,
Denis Dean II
Durga Addepalli
Wouter Haak
Anita De Waard
Paul Groth
Oscar Corcho
Josh Sommer
Project ID: 675728

More Related Content

What's hot

Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
Carole Goble
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
Norman Morrison
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
Carole Goble
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
Carole Goble
 
ROHub
ROHubROHub
ROHub
Raul Palma
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
Carole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
Carole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
Carole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
Carole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
dgarijo
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 

What's hot (20)

Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Research Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOMResearch Objects, SEEK and FAIRDOM
Research Objects, SEEK and FAIRDOM
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
ROHub
ROHubROHub
ROHub
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 

Similar to Being FAIR: Enabling Reproducible Data Science

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...European School of Oncology
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
C. Tobin Magle
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
GigaScience, BGI Hong Kong
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Syed Ahmad Chan Bukhari, PhD
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
Waqas Tariq
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
Pistoia Alliance
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
Anne Deslattes Mays
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Tom Plasterer
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics PosterMichael Atkins
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
LizLyon
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
Laura Clarke
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
Warren Kibbe
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
Benjamin Good
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 

Similar to Being FAIR: Enabling Reproducible Data Science (20)

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
 
Cancer Analytics Poster
Cancer Analytics PosterCancer Analytics Poster
Cancer Analytics Poster
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 

More from Carole Goble

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
Carole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
Carole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
Carole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
Carole Goble
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
Carole Goble
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble
 

More from Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

Recently uploaded

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 

Recently uploaded (20)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 

Being FAIR: Enabling Reproducible Data Science

  • 1. Being FAIR: Enabling Reproducible Data Science Professor Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk 2018 Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018
  • 2. Disclosure Knowledge management Computational workflows Sharing and exchange Reproducibility Large e-Infrastructure projects for life science data
  • 3. The Learning Health System Phenotypic Patient Records Patient cohort building Patient stratification Case notes Discharge notes Patient cohorts Patient Multi-omics Public Reference repositories text mining, data mining data & vocabulary linking data analytics Single cell omics Clinical genomics Quantitative biology e-Health Predictive models Sensors Diagnostics Biomarkers Imaging Research Clinical Biobanks Scientific Literature Patient Public Health [Friedman]
  • 5. Barriers to Cure • Access to scientific resources • Coordination,Collaboration • Flow of Information • FAIR Data, FAIR Methods • FAIR Object Commons [Josh Sommer] GobleC., De Roure D., Bechhofer S. (2013) AcceleratingScientists’ KnowledgeTurns, https://doi.org/10.1007/978-3-642-37186-8_1 Research Commons Accelerate inter-lab knowledge turns Accumulate knowledge
  • 6. 1. A Research Commons “… a “cloud-based” platform where investigators can store, share, access, and interact with digital objects (data, software, [models, SOPs], etc.) generated from …. research. By connecting the digital objects and making them accessible, the Data Commons is intended to allow novel scientific research that was not possible before, including hypothesis generation, discovery, and validation.” https://commonfund.nih.gov/commons Pooled Resources Federated Find andAccess Many entry points Data + Methods + Models
  • 7. Clear steps Transparent Comprehensible Replicable Logged Accessible Provenance Standardised Harmonised Combined Method Materials Variations X N Repeat. Compare. Log & Track Provenance Scale 2. Data-driven Science, Predictive Science is Software-driven, Method-Driven
  • 8. 3. Reuse and Reproducibility Is hard for in vivo/vitro and even for in silico analysis • OS version • Revision of scripts • Data analysis software versions • Version of data files • Command line parameters written on a napkin • “Magic” the grad student knows…. [Keiichiro Ono, Scripps Institute]
  • 9. Findable (Citable) Accessible (Trackable) Interoperable (Intelligible) Reusable (Reproducible) Record Automate Contain Access
  • 10. FAIR provenance portability preservation robustness access description standards, common APIs licensing standards, common metadata versioning, deviation variation sensitivity discrepancy handling parametric spaces packaging, containers dependencies steps ids Reproduce and reuse computations Transparently communicate the way computations are performed Disambiguate interpretation of inputs/parameters/results Safely (re)run computations ported onto different platforms Human and computer readable definitions for the provenance of computation, types for the data and results
  • 11. Cancer Data Integrator [Várna,Davies, NIHR Health Informatics Collaborative, UK]
  • 12.
  • 13. Objects: data + methods + models + provenance + Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D SEMS, University of Rostock zip-like file with a manifest & metadata - Bundle files - Keep provenance - Exchange data - Ship results Bergmann, F.T. (2014). COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1. Combine Archive Systems Biology Systems Medicine https://sems.unirostock.de/projects/combinearchive/
  • 14. Research Object Framework Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/ carry machine processable metadata in common and specific to different object types. bundle together and relate digital resources with their context into a unit. snapshot, cite, exchange run, evolve accumulate interlink Standards-based generic metadata framework
  • 15. Container Metadata Object metadata, ontologies, identifiers “Unbounded” Objects Bags of things and external references to things Data used and results produced … Methods employed to produce and analyse that data … Provenance and settings … People involved … Annotations understanding & interpretation …
  • 16. • Co-localizing massive genomics datasets, like The Cancer Genomics Atlas, alongside secure and scalable computational resources to analyze them. • Analyze own data alongside TCGA using predefined analytical workflows or your own tools. • Petabyte of multi- dimensional data available to authorized researchers. • Fully reproducible execution • Secure team collaboration. http://www.cancergenomicscloud.org/ NCI Cancer Genomics Cloud (CGC) Pilot
  • 17. HTS pipelines for precision medicine GATK:Tumor-Normal Paired Exome-Sequencing pipeline [Durga Addepalli, Seven Bridges]
  • 18. HTS pipelines for precision medicine GATK:Tumor-Normal Paired Exome-Sequencing pipeline [Durga Addepalli, Seven Bridges] Inputs OutputsAnalysis
  • 19. Workflow Input Data (Files) Output Data (Files) Software Component Settings (Annotation) Workflow is defined usingCommonWorkflow Language (CWL) Software components are Docker images http://www.cancergenomicscloud.org/ Analysis
  • 20. Output FilesInput Files Intermediates Parameters Configurations Workflow Run Provenance Narrative ExecutionWorkflow Engine Tools / Codes Resources Author Workflow Container Metadata Analysis
  • 21. Parameters Configurations Workflow Provenance Workflow Engine Algorithms, Pipelines Definitions of the Metadata Instances Data files Computation metadata Tools / Codes metadata Biocompute workflow Data formats Ontologies Data files Results Container Stratified, Shareable Objects Scientifically reliable interpretation Verifiable results within acceptable uncertainty/error Comparable results
  • 22. Parameters Configurations Workflow Provenance Workflow Engine Algorithms, Pipelines Definitions of the Metadata Instances Data files Computation metadata Tools / Codes metadata Biocompute workflow Data formats Ontologies Data files Results Container Biocontainers bio.tools CWLViewer
  • 23. Open standards, commodity systems Describe and run workflows, and the command line tools they orchestrate, supporting containers to be portable, transparent and interoperable . Describe the workflow inputs, outputs, tools and data with controlled vocabularies / ontologies EDAM Describe the provenance of the workflow Software components are containerised to be portable Workflow systems run the CWL workflow Gathers the CWL workflow descriptions together with rich context and provenance using multi-tiered descriptions Snapshots the workflow. Relates it to other objects. Uses archive formats to contain the object
  • 26. FAIR Methods, different workflow systems & clouds Living Products
  • 27. https://osf.io/h59uh/ Personalized medicine regulation Standardize exchange of HTS workflows for regulatory submissions between FDA, pharma, bioinformatics platform providers and researchers Inspect and replicate the computational analytical workflow to review and approve the bioinformatics Domain-specific object model captures essential information without going in details of the actual execution. A community-driven project Emphasis on robust, safe reuse Technical Reproducibility packaging software and providing required datasets Human understanding of what has been done higher level steps of the workflow, their parameter spaces and algorithm settings Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
  • 28. analysis and review sample archival sequencing run file transfer regulation computation pipelines produced files are massive in size transfer is slow too large to keep forever; not standardized difficult to validate/verify how can industry and FDA work together to avoid mistakes? HTS lifecycle: from a biological sample to biomedical research and regulation [Vahan Simonyan] FDA BAA contract HHSF223201510129C (PI: Raja Mazumder)
  • 31. BioCompute Framework to advance Regulatory Science to support NGS analysis Emphasis on robust, safe reuse. Describe and validate the metadata of packages, and their contents, both inside and outside Standardise data formats and elements and exchange of Electronic Health Records Describe and validate analysis workflows, to be portable and interoperable Standardise and support sharing and analysis of Genomic data Ontologies Controlled vocabularies for describing all of the above APIs Programmable interfaces for accessing all of the above Alterovitz, Dean II, Goble, Crusoe, Soiland-Reyes et al Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results, biorxiv.org, 2017, https://doi.org/10.1101/191783
  • 33. Living Objectisms: grow, evolve, mutate • RO life cycles – Fixed snapshot – Living objects – Rot, mutate, clone • Arose from workflow sharing and preservation • Research Objects are analogous to software artefacts and practices rather than data or articles Snapshot Fork Combine
  • 34. Validate Container Manifest Profile Descriptions what else is needed Dependencies Versioning its evolution what should be there Checklists Provenance where it came from ids metadata that describes Research Object general purpose to drive scalable infrastructure
  • 35. All Type Specific Implementation specific Container Manifest Profile Descriptions what else is needed Dependencies Versioning its evolution what should be there Checklists Provenance where it came from ids metadata that describes Research Object
  • 36. Container Profile Under the hood building blocks: metadata that describes metadata general purpose to drive scalable infrastructure Manifest Construction Profile Construction IDENTIFIER
  • 37. Many other kinds of objects Multiple object types in an investigation Structured collections of objects Physical objects, SOPs These examples wereWorkflow Objects… [Sansone] Asthma Research e-Lab [Phil Crouch, John Ainsworth, Iain Buchan]
  • 38. Chard et al: I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets, https://doi.org/10.1109/BigData.2016.7840618 Dnase HypersensitivityAnalysis using ENCODE (Encyclopedia of DNA Elements ) access, analysis and publishing using Galaxy images and genome sequences assembled from diverse repositories data distributed across multiple locations, referenced because big and persisted, efficiently and safely moved on demand Assemble and share large scale, multi-element datasets. [Chard, Kesselman, Foster, Madduri, 2016]
  • 39. Richly structured descriptions of content in the bag and outside it Transfer and archive very large HTS datasets in a location- independent way. Secure referencing and moving of patient data. Big Data collections of arbitrary referenced content annotations, provenance, relations checksums Simple, location independent persistent identifiers Define a dataset and its contents by enumerating its elements, regardless of their location Verify and validate content
  • 40. FAIR Data Commons 3. Everything is a research object: all the (distributed) components of an investigation (models, data, pipelines, SOPs, provenance...) into citable, exchangeable, publishable, preserved, nested objects 1. Assemble and share large scale, multi- element datasets. Secure referencing and moving of patient data. 2. Reproduce, port, share, and execute HTS pipelines (and other analytics …)
  • 41. The Knowledge Object Reference Ontology (KORO): A formalism to support management and sharing of computable biomedical knowledge for learning health systems Flynn, Friedman, Boisvert, Landis‐Lewis, Lagoze (2018), https://doi.org/10.1002/lrh2.10054 Graphs of Research Objects Track Research Objects Combine and enrich Research Objects Learning Health Systems
  • 42. International Efforts: FAIR Life Science Data Infrastructure • EGA in a Box for storing, coordinating and distributing human data • Human Data Beacons discovery service • Authentication and Authorization Infrastructure Interoperability, Compute, Data, Tools,Training Tools andWorkflow collaboratory for EOSC https://www.elixir-europe.org/use-cases/human-data
  • 43. Summary: help knowledge turning • Data Science is underpinned by data access + transparent methods to enable reproducible and FAIR knowledge exchange. • FAIR First. • Research Objects as the currency of reproducibility and exchange • A bunch of tech, standards, tooling, best practices, grass roots and international activities going on. • Tech isn’t the issue. • e-Infrastructure matters. Please care about it.
  • 45. Melissa Haendel, PhD Director of Translational Data Science, Oregon State University Director of the Center for Data to Health, Oregon Health & Science University
  • 46. Acknowledgements Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Mark Robinson AlanWilliams Norman Morrison Stian Soiland-Reyes Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone KristianGarza Daniel Garijo Catarina Martins Iain Buchan Michael Crusoe Rob Finn Carl Kesselman Ian Foster Kyle Chard Vahan Simonyan Ravi Madduri Raja Mazumder GilAlterovitz, Denis Dean II Durga Addepalli Wouter Haak Anita De Waard Paul Groth Oscar Corcho Josh Sommer Project ID: 675728