SlideShare a Scribd company logo
The CEDAR Workbench for
Metadata Management
Mark A. Musen, M.D., Ph.D.
Stanford University
musen@Stanford.EDU
A Story
Purvesh Khatri, Ph.D. A self-professed “data parasite”
Gene Expression Omnibus (GEO)
4
Khatri has used GEO to build a career from
the analysis of “other people’s data”
Khatri has a pipeline for
• Searching for datasets in the GEO database
• Elucidating genomic signals
• Confirming those signals in “validation” data
sets
• Making discoveries without ever performing a
primary experiment
Khatri has reused public data sets to
identify genomic signatures …
• For incipient sepsis
• For active tuberculosis
• For distinguishing viral from bacterial
respiratory infection
• For rejection of organ transplants
… and he has never touched a pipette!
But you can be a data parasite only if
scientific data are …
• Available in an accessible repository
• Findable through some sort of search
facility
• Retrievable in a standard format
• Self-describing so that third parties can
make sense of the data
• Intended to be reused, and to outlive the
experiment for which they were collected
But data in most repositories are a mess!
• Investigators view their work as
publishing papers, not leaving a legacy of
reusable data
• Sponsors—both public and private—may
require data sharing, but they do not
explicitly pay for it
• Creating the metadata to describe data
sets is unbearably hard
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Failure to use standard terms makes
datasets often impossible to search
Metadata from NCBI BioSample Stink!
• 73% of “Boolean” metadata values are not
actually true or false
– nonsmoker, former-smoker
• 26% of “integer” metadata values cannot be
parsed into integers
– JM52, UVPgt59.4, pig
• 68% of metadata entries that are supposed to
represent terms from biomedical ontologies do
not actually do so.
– presumed normal, wild_type
At a minimum, science needs
• Open, online access to experimental data sets
• Mechanisms to search for metadata to find
relevant experimental results
• Annotation of online data sets with adequate
metadata
• Use of controlled terms in metadata whenever
possible
• A culture that cares about data sharing,
secondary analyses, and the ability to make new
discoveries from legacy data
Requirement #1: Standard
ontologies to describe what
exists in a dataset completely
and consistently
http://bioportal.bioontology.org
BioPortal User Activity—Interactive
BioPortal User Activity—Programmatic
Requirement #2: Describe
properties of experiments
completely and consistently
When we look at experimental data …
• How can we tell what the data are about?
• How can we search for additional data that we
might be interested in?
• How can we group data into similar
categories?
The microarray community took the lead in
standardizing biological metadata
DNA Microarray
— What was the
substrate of the
experiment?
— What array
platform was
used?
— What were the
experimental
conditions?
But it didn’t stop with MIAME!
• Minimal Information About T Cell Assays
(MIATA)
• Minimal Information Required in the
Annotation of biochemical Models (MIRIAM)
• MINImal MEtagemome Sequence analysis
Standard (MINIMESS)
• Minimal Information Specification For In Situ
Hybridization and Immunohistochemistry
Experiments (MISFISHIE)
FAIRsharing.org lists more than
100 guidelines for reporting
experimental data
Guidelines
Minimal Information Guidelines
are not Models
• MIAME and its kin specify only the
“kinds of things” that investigators should
include in their metadata
• They do not provide a detailed list of standard
metadata elements
• They do not provide datatypes for valid
metadata entries
• It takes work to convert a prose checklist into
a computable model
To make sense of metadata …
• We need computer-based standards
– For what we want to say about experimental data
– For the language that we use to say those things
– For how we structure the metadata
• We need a scientific culture that values
making experimental data accessible to others
Requirement #3: Make it
palatable—maybe even fun—to
describe experiments completely
and consistently
http://metadatacenter.org
A Metadata Ecosystem
• Scientists perform experiments and generate
datasets
• Community-based standards groups create
minimal information models (like MIAME) to
define requirements for annotating particular
kinds of experimental data in a uniform manner
• CEDAR users create formal metadata templates
based on community standards
• CEDAR users fill in metadata templates to create
instances of metadata
• CEDAR stores the datasets (and the metadata
instances) in public repositories
The CEDAR Approach to Metadata
The CEDAR Workbench
The CEDAR Workbench
x
brain
Parkinson’s disease (DOID) (39%)
central nervous system lymphoma (DOID) (27%)
autistic disorder (DOID) (22%)
melanoma (DOID) (5%)
schizophrenia (DOID) (1%)
Edwards syndrome (DOID) (2%)
The CEDAR Workbench
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
AIRR is providing our first experience
uploading CEDAR-authored
metadata directly to NCBI
Scientists
Data
Curators
MiAIRR
Template
generate
experimental
data
Submission
Manager
Template
Designer
Submission Service
create
template
enter
metadata
manage
submissions
submission status
Biomedical Ontologies
1. TEMPLATE CREATION
2. METADATA EDITING
3. SUBMISSION
Submission Generator
Submission Status Monitor
Metadata
Editor
Metadata
Data
Metadata
Metadata
Data
BioProject
BioSample
SRA
NCBI Repositories
Metadata
Data
check
submission
status
MiAIRR
Standard
NCBI FTP
Server
Metadata
AIRR: CEDAR serves as a conduit to
various NCBI repositories
LINCS Repository
generate
experimental
data
1. TEMPLATE CREATION
2. METADATA EDITING
3. SUBMISSION
Metadata
Editor
Template
Designer
Biomedical OntologiesCurators
create
template
Scientists
Metadata
Data
enter
metadata
LINCS
Templates
Templat
es
Templat
es
LINCS
Templates
Data
submit
Metadata
Data
Metadata
LINCS: CEDAR provides access to a
central data-coordinating center
56
GO-FAIR: Sponsors use CEDAR to
declare minimal metadata for grantees
With CEDAR, the
data parasites of
the world will be
able to learn from
data that are
“cleaner,” better
organized, and
more searchable
Scientists will learn
when new results
become available
that are relevant to
their work, and how
those results relate
to the data that
they are already
collecting
Physicians will be
able to find the
clinical evidence
that best helps
them to select
treatments for their
patients
Pharma will know
what experiments
have already been
done—by their own
company and by
those with which
they have merged!
Once scientific knowledge and data are
disseminated in machine-interpretable form …
• Technology such as CEDAR will assist in the
automated “publication” of scientific results
online
• Intelligent agents will
– Search and “read” the “literature”
– Integrate information
– Track scientific advances
– Re-explore existing scientific datasets
– Suggest the next set of experiments to perform
– And maybe even do them!
http://metadatacenter.org

More Related Content

What's hot

Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
Richard Layton
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
Pranavathiyani G
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Pistoia Alliance
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
Eagle Genomics
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19
Angelo Pugliese
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
Pistoia Alliance
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
Michel Dumontier
 
An Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and GenomicsAn Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and Genomics
Brittany Lasseigne, Ph.D.
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
Eagle Genomics
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
Paul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
CINECAProject
 

What's hot (20)

Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Considerations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflowConsiderations and challenges in building an end to-end microbiome workflow
Considerations and challenges in building an end to-end microbiome workflow
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Link Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked DataLink Analysis of Life Sciences Linked Data
Link Analysis of Life Sciences Linked Data
 
An Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and GenomicsAn Introduction to Machine Learning and Genomics
An Introduction to Machine Learning and Genomics
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
CINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIRCINECA webinar slides: Making cohort data FAIR
CINECA webinar slides: Making cohort data FAIR
 

Similar to CEDAR work bench for metadata management

dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Michel Dumontier
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
Research Information Network
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
Carole Goble
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
Amrapali Zaveri, PhD
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
Varsha Khodiyar
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
William Hsiao
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
Stuti Nayak
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
Ontology based phenotype database and mining tool
Ontology based phenotype database and mining toolOntology based phenotype database and mining tool
Ontology based phenotype database and mining toolJennifer Smith
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Adam Ford
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
GigaScience, BGI Hong Kong
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019
Ian Fore
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
Carole Goble
 
OSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery index
Open Science Fair
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
Chris Dwan
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Remedy Informatics
 

Similar to CEDAR work bench for metadata management (20)

dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality AssessmentMetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
MetaCrowd: Crowdsourcing Gene Expression Metadata Quality Assessment
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Ontology based phenotype database and mining tool
Ontology based phenotype database and mining toolOntology based phenotype database and mining tool
Ontology based phenotype database and mining tool
 
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
Ben Goertzel AIs, Superflies and the Path to Immortality - singsum au 2011
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Fore FAIR ISMB 2019
Fore FAIR ISMB 2019Fore FAIR ISMB 2019
Fore FAIR ISMB 2019
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
OSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery index
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 

More from Pistoia Alliance

Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirti
Pistoia Alliance
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
Pistoia Alliance
 
Data market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRData market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIR
Pistoia Alliance
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Pistoia Alliance
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcare
Pistoia Alliance
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
Pistoia Alliance
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
Pistoia Alliance
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoE
Pistoia Alliance
 
Blockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesBlockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab Slides
Pistoia Alliance
 
Knowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA SlideshowKnowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA Slideshow
Pistoia Alliance
 
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Pistoia Alliance
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia Alliance
 
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Pistoia Alliance
 
Pistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseasesPistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance
 
blockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-allianceblockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-alliance
Pistoia Alliance
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance
 
CDx-NGS-webinar
CDx-NGS-webinarCDx-NGS-webinar
CDx-NGS-webinar
Pistoia Alliance
 
Pistoia alliance webinar uxls master slide deck
Pistoia alliance webinar uxls master slide deckPistoia alliance webinar uxls master slide deck
Pistoia alliance webinar uxls master slide deck
Pistoia Alliance
 

More from Pistoia Alliance (19)

Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirti
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
Data market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRData market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIR
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcare
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoE
 
Blockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesBlockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab Slides
 
Knowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA SlideshowKnowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA Slideshow
 
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018Data quality supporting AI in Life Sciences webinar 10 dec 2018
Data quality supporting AI in Life Sciences webinar 10 dec 2018
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
 
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
 
Pistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseasesPistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseases
 
blockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-allianceblockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-alliance
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
CDx-NGS-webinar
CDx-NGS-webinarCDx-NGS-webinar
CDx-NGS-webinar
 
Pistoia alliance webinar uxls master slide deck
Pistoia alliance webinar uxls master slide deckPistoia alliance webinar uxls master slide deck
Pistoia alliance webinar uxls master slide deck
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 

CEDAR work bench for metadata management

  • 1. The CEDAR Workbench for Metadata Management Mark A. Musen, M.D., Ph.D. Stanford University musen@Stanford.EDU
  • 3. Purvesh Khatri, Ph.D. A self-professed “data parasite”
  • 5. Khatri has used GEO to build a career from the analysis of “other people’s data” Khatri has a pipeline for • Searching for datasets in the GEO database • Elucidating genomic signals • Confirming those signals in “validation” data sets • Making discoveries without ever performing a primary experiment
  • 6. Khatri has reused public data sets to identify genomic signatures … • For incipient sepsis • For active tuberculosis • For distinguishing viral from bacterial respiratory infection • For rejection of organ transplants … and he has never touched a pipette!
  • 7. But you can be a data parasite only if scientific data are … • Available in an accessible repository • Findable through some sort of search facility • Retrievable in a standard format • Self-describing so that third parties can make sense of the data • Intended to be reused, and to outlive the experiment for which they were collected
  • 8. But data in most repositories are a mess! • Investigators view their work as publishing papers, not leaving a legacy of reusable data • Sponsors—both public and private—may require data sharing, but they do not explicitly pay for it • Creating the metadata to describe data sets is unbearably hard
  • 9.
  • 10. age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Failure to use standard terms makes datasets often impossible to search
  • 11. Metadata from NCBI BioSample Stink! • 73% of “Boolean” metadata values are not actually true or false – nonsmoker, former-smoker • 26% of “integer” metadata values cannot be parsed into integers – JM52, UVPgt59.4, pig • 68% of metadata entries that are supposed to represent terms from biomedical ontologies do not actually do so. – presumed normal, wild_type
  • 12.
  • 13. At a minimum, science needs • Open, online access to experimental data sets • Mechanisms to search for metadata to find relevant experimental results • Annotation of online data sets with adequate metadata • Use of controlled terms in metadata whenever possible • A culture that cares about data sharing, secondary analyses, and the ability to make new discoveries from legacy data
  • 14.
  • 15. Requirement #1: Standard ontologies to describe what exists in a dataset completely and consistently
  • 16.
  • 17.
  • 18.
  • 19.
  • 21.
  • 24. Requirement #2: Describe properties of experiments completely and consistently
  • 25. When we look at experimental data … • How can we tell what the data are about? • How can we search for additional data that we might be interested in? • How can we group data into similar categories?
  • 26. The microarray community took the lead in standardizing biological metadata DNA Microarray — What was the substrate of the experiment? — What array platform was used? — What were the experimental conditions?
  • 27.
  • 28. But it didn’t stop with MIAME! • Minimal Information About T Cell Assays (MIATA) • Minimal Information Required in the Annotation of biochemical Models (MIRIAM) • MINImal MEtagemome Sequence analysis Standard (MINIMESS) • Minimal Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE)
  • 29. FAIRsharing.org lists more than 100 guidelines for reporting experimental data Guidelines
  • 30. Minimal Information Guidelines are not Models • MIAME and its kin specify only the “kinds of things” that investigators should include in their metadata • They do not provide a detailed list of standard metadata elements • They do not provide datatypes for valid metadata entries • It takes work to convert a prose checklist into a computable model
  • 31. To make sense of metadata … • We need computer-based standards – For what we want to say about experimental data – For the language that we use to say those things – For how we structure the metadata • We need a scientific culture that values making experimental data accessible to others
  • 32. Requirement #3: Make it palatable—maybe even fun—to describe experiments completely and consistently
  • 34. A Metadata Ecosystem • Scientists perform experiments and generate datasets • Community-based standards groups create minimal information models (like MIAME) to define requirements for annotating particular kinds of experimental data in a uniform manner • CEDAR users create formal metadata templates based on community standards • CEDAR users fill in metadata templates to create instances of metadata • CEDAR stores the datasets (and the metadata instances) in public repositories
  • 35. The CEDAR Approach to Metadata
  • 36.
  • 37.
  • 38.
  • 40.
  • 41.
  • 42.
  • 44. x
  • 45.
  • 46.
  • 47. brain Parkinson’s disease (DOID) (39%) central nervous system lymphoma (DOID) (27%) autistic disorder (DOID) (22%) melanoma (DOID) (5%) schizophrenia (DOID) (1%) Edwards syndrome (DOID) (2%)
  • 49.
  • 50. age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years
  • 51.
  • 52. AIRR is providing our first experience uploading CEDAR-authored metadata directly to NCBI
  • 53. Scientists Data Curators MiAIRR Template generate experimental data Submission Manager Template Designer Submission Service create template enter metadata manage submissions submission status Biomedical Ontologies 1. TEMPLATE CREATION 2. METADATA EDITING 3. SUBMISSION Submission Generator Submission Status Monitor Metadata Editor Metadata Data Metadata Metadata Data BioProject BioSample SRA NCBI Repositories Metadata Data check submission status MiAIRR Standard NCBI FTP Server Metadata AIRR: CEDAR serves as a conduit to various NCBI repositories
  • 54.
  • 55. LINCS Repository generate experimental data 1. TEMPLATE CREATION 2. METADATA EDITING 3. SUBMISSION Metadata Editor Template Designer Biomedical OntologiesCurators create template Scientists Metadata Data enter metadata LINCS Templates Templat es Templat es LINCS Templates Data submit Metadata Data Metadata LINCS: CEDAR provides access to a central data-coordinating center
  • 56. 56
  • 57. GO-FAIR: Sponsors use CEDAR to declare minimal metadata for grantees
  • 58. With CEDAR, the data parasites of the world will be able to learn from data that are “cleaner,” better organized, and more searchable
  • 59. Scientists will learn when new results become available that are relevant to their work, and how those results relate to the data that they are already collecting
  • 60. Physicians will be able to find the clinical evidence that best helps them to select treatments for their patients
  • 61. Pharma will know what experiments have already been done—by their own company and by those with which they have merged!
  • 62. Once scientific knowledge and data are disseminated in machine-interpretable form … • Technology such as CEDAR will assist in the automated “publication” of scientific results online • Intelligent agents will – Search and “read” the “literature” – Integrate information – Track scientific advances – Re-explore existing scientific datasets – Suggest the next set of experiments to perform – And maybe even do them!