Supporting Ontology-Based
Standardization of Biomedical Metadata in
the CEDAR Workbench
Marcos Martínez-Romero
Stanford University
Stanford Universitymetadatacenter.org
EDAR
OR EXPANDED DATA
ION AND RETRIEVAL
CEDAR
CENTER FOR EXPANDED DATA
ANNOTATION AND RETRIEVAL
CEDAR
DAR
DAR
CENTER FOR EXPANDED DATA
9/14/2017
2
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Metadata are not standardized
3
age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
Metadata are not standardized
Age-Years (NCIT)
(http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C37908)
“The	length	of	a	person's	life,	stated	in	years	since	birth.”
4
It’s extremely hard to:
–find experimental datasets
–understand how the experiments were
performed
–replicate study findings
Metadata are not standardized
5
Generating standard metadata is hard
• Submission formats rarely support
ontology terms
• No easy way of finding terms from
ontologies and including them into
metadata submissions
6
7
Semantic ecosystem to enable the
creation of high-quality metadata in
biomedicine
8
The CEDAR Workbench
Template Designer Metadata Editor
Template authors Metadata authors
design
templates
Metadata Repository
template
fill in templates
with metadata
metadata
Public Databases
LINCS
submit
metadata
Biomedical Ontologies 9
Template Designer Metadata Editor
Template authors Metadata authors
design
templates
Metadata Repository
template
fill in templates
with metadata
metadata
Public Databases
LINCS
submit
metadata
Biomedical Ontologies
The CEDAR Workbench
10
11
12
13
14
15
16
17
18
19
20
21
Template Designer Metadata Editor
Template authors Metadata authors
design
templates
Metadata Repository
template
fill in templates
with metadata
metadata
Public Databases
LINCS
submit
metadata
Biomedical Ontologies
The CEDAR Workbench
22
23
24
{
"@context": {
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"pav": "http://purl.org/pav/",
//...
"Title": "http://purl.obolibrary.org/obo/NGS_0000055",
"Disorder": "http://purl.org/net/OCRe/OCRe.owl#OCRE900086",
"Institution": "http://semantic-dicom.org/dcm#InstitutionName",
"Principal Investigator": "http://purl.org/net/OCRe/OCRe.owl#OCRE901006",
"Study Type": "http://purl.obolibrary.org/obo/NGS_0000056"
},
"@type": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C63536",
"Title": {
"@value": "A sample study"
},
"Disorder": {
"@id": "http://purl.obolibrary.org/obo/DOID_8986",
"rdfs:label": "narcolepsy"
},
"Institution": {
"@value": "Stanford University"
},
"Principal Investigator": {
"@value": "John Doe"
},
"Study Type": {
"@id": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C15273",
"rdfs:label": "Longitudinal Study"
},
// ...
"schema:isBasedOn": "https://repo.metadatacenter.orgx/templates/6381a0ce-3904-4885-bc44-5caacb4ad0e6",
"schema:name": "Study metadata",
"schema:description": "Study template",
"pav:createdOn": "2017-09-05T09:50:28-0700",
"pav:createdBy": "https://metadatacenter.org/users/8d787b98-33dd-4aff-a88c-440caf452c61",
"pav:lastUpdatedOn": "2017-09-05T09:50:28-0700",
"oslc:modifiedBy": "https://metadatacenter.org/users/8d787b98-33dd-4aff-a88c-440caf452c61",
"@id": "https://repo.metadatacenter.orgx/template-instances/ffe856e7-d920-480d-a666-009041f609e3"
}
25
Value Set Creation
• Lists of permissible
values for fields
• Example:
Longitudinal study types
– Prospective study
– Retrospective study
– Hybrid design
26
Class Creation
• Dynamically define a
new class and
immediately use it
• Optionally link it to
existing classes
– Ontology maintainers
may use this
information to enrich
their ontologies
• Example:
adductor dorsalis
27
Class Creation
CEDAR Provisional Classes
(CEDARPC)
UBERON
adductor
dorsalis
adductor
muscle
subclassOf
28
Evaluation
• The LINCS Consortium
– Cellular signatures
• ImmPort
– Immunology
• AIRR Community
– Datasets acquired using sequencing
– Submission to NCBI BioSample
• Stanford University Libraries
29
Summary
• Authoring metadata is hard and time-consuming
• Authoring semantic metadata is even harder
– Lack of convenient tools for linking metadata to
ontologies in a metadata authoring workflow
• The CEDAR Workbench facilitates metadata
creation in a semantically rigorous way
– Add type and property assertions
– Constrain the values of fields to ontology terms
– Create classes and value sets
http://metadatacenter.org
http://cedar.metadatacenter.net
30
CEDAR
CENTER FOR EXPANDED DATA
ANNOTATION AND RETRIEVAL
CEDAR
CENTER FOR EXPANDED DATA
ANNOTATION AND RETRIEVAL
CEDAR
CEDAR
CEDAR
I
Metadata
Thanks!
31

Supporting Ontology-Based Standardization of Biomedical Metadata in the CEDAR Workbench (ICBO 2017 Conference)

  • 1.
    Supporting Ontology-Based Standardization ofBiomedical Metadata in the CEDAR Workbench Marcos Martínez-Romero Stanford University Stanford Universitymetadatacenter.org EDAR OR EXPANDED DATA ION AND RETRIEVAL CEDAR CENTER FOR EXPANDED DATA ANNOTATION AND RETRIEVAL CEDAR DAR DAR CENTER FOR EXPANDED DATA 9/14/2017
  • 2.
  • 3.
    age Age AGE `Age age (after birth) age(in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Metadata are not standardized 3
  • 4.
    age Age AGE `Age age (after birth) age(in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years Metadata are not standardized Age-Years (NCIT) (http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C37908) “The length of a person's life, stated in years since birth.” 4
  • 5.
    It’s extremely hardto: –find experimental datasets –understand how the experiments were performed –replicate study findings Metadata are not standardized 5
  • 6.
    Generating standard metadatais hard • Submission formats rarely support ontology terms • No easy way of finding terms from ontologies and including them into metadata submissions 6
  • 7.
  • 8.
    Semantic ecosystem toenable the creation of high-quality metadata in biomedicine 8
  • 9.
    The CEDAR Workbench TemplateDesigner Metadata Editor Template authors Metadata authors design templates Metadata Repository template fill in templates with metadata metadata Public Databases LINCS submit metadata Biomedical Ontologies 9
  • 10.
    Template Designer MetadataEditor Template authors Metadata authors design templates Metadata Repository template fill in templates with metadata metadata Public Databases LINCS submit metadata Biomedical Ontologies The CEDAR Workbench 10
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    Template Designer MetadataEditor Template authors Metadata authors design templates Metadata Repository template fill in templates with metadata metadata Public Databases LINCS submit metadata Biomedical Ontologies The CEDAR Workbench 22
  • 23.
  • 24.
  • 25.
    { "@context": { "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "xsd":"http://www.w3.org/2001/XMLSchema#", "pav": "http://purl.org/pav/", //... "Title": "http://purl.obolibrary.org/obo/NGS_0000055", "Disorder": "http://purl.org/net/OCRe/OCRe.owl#OCRE900086", "Institution": "http://semantic-dicom.org/dcm#InstitutionName", "Principal Investigator": "http://purl.org/net/OCRe/OCRe.owl#OCRE901006", "Study Type": "http://purl.obolibrary.org/obo/NGS_0000056" }, "@type": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C63536", "Title": { "@value": "A sample study" }, "Disorder": { "@id": "http://purl.obolibrary.org/obo/DOID_8986", "rdfs:label": "narcolepsy" }, "Institution": { "@value": "Stanford University" }, "Principal Investigator": { "@value": "John Doe" }, "Study Type": { "@id": "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C15273", "rdfs:label": "Longitudinal Study" }, // ... "schema:isBasedOn": "https://repo.metadatacenter.orgx/templates/6381a0ce-3904-4885-bc44-5caacb4ad0e6", "schema:name": "Study metadata", "schema:description": "Study template", "pav:createdOn": "2017-09-05T09:50:28-0700", "pav:createdBy": "https://metadatacenter.org/users/8d787b98-33dd-4aff-a88c-440caf452c61", "pav:lastUpdatedOn": "2017-09-05T09:50:28-0700", "oslc:modifiedBy": "https://metadatacenter.org/users/8d787b98-33dd-4aff-a88c-440caf452c61", "@id": "https://repo.metadatacenter.orgx/template-instances/ffe856e7-d920-480d-a666-009041f609e3" } 25
  • 26.
    Value Set Creation •Lists of permissible values for fields • Example: Longitudinal study types – Prospective study – Retrospective study – Hybrid design 26
  • 27.
    Class Creation • Dynamicallydefine a new class and immediately use it • Optionally link it to existing classes – Ontology maintainers may use this information to enrich their ontologies • Example: adductor dorsalis 27
  • 28.
    Class Creation CEDAR ProvisionalClasses (CEDARPC) UBERON adductor dorsalis adductor muscle subclassOf 28
  • 29.
    Evaluation • The LINCSConsortium – Cellular signatures • ImmPort – Immunology • AIRR Community – Datasets acquired using sequencing – Submission to NCBI BioSample • Stanford University Libraries 29
  • 30.
    Summary • Authoring metadatais hard and time-consuming • Authoring semantic metadata is even harder – Lack of convenient tools for linking metadata to ontologies in a metadata authoring workflow • The CEDAR Workbench facilitates metadata creation in a semantically rigorous way – Add type and property assertions – Constrain the values of fields to ontology terms – Create classes and value sets http://metadatacenter.org http://cedar.metadatacenter.net 30
  • 31.
    CEDAR CENTER FOR EXPANDEDDATA ANNOTATION AND RETRIEVAL CEDAR CENTER FOR EXPANDED DATA ANNOTATION AND RETRIEVAL CEDAR CEDAR CEDAR I Metadata Thanks! 31