1. Standardization of the HIPC Data
Templates: The Story So Far
Ahmad C. Bukhari, Ph.D., Kei-Hoi Cheung, Ph.D. and Steven H. Kleinstein, Ph.D.
Yale University, School of Medicine
User Group
(HIPC)
2. ● An important resource for raw data and protocols from clinical trials,
mechanistic studies and novel methods for cellular and molecular
measurements
● Provides templates and standard operating procedures to facilitate data
representation and transfer.
● Provides a variety of tools for data access and manipulation
ImmPort
SQL Dump for local
hosting
3. Human Immunology Project Consortium (HIPC)
● Well-characterized human cohorts are studied using a variety of modern
analytic tools including multiplex transcriptional, cytokine, and proteomic
assays.
● HIPC submitted data is an important subset of the ImmPort database
● Submitted HIPC data is not standardized.
● Inconsistent naming and data reporting
4. Our aim is to make HIPC data FAIR
● Findability
○ Finding a large variety of related datasets is an important step to knowledge discovery
● Accessibility
○ A growing number of datasets are being submitted to public repositories such as ImmPort.
These datasets can accessed through different methods including web-based search, bulk
download and API access
● Interoperability
○ Data mining/analysis often requires multiple datasets to be integrated within a single repository
or across multiple repositories
● Reusability
○ Entering enough metadata as part of the data submission process facilitates data reuse
❖ FAIR a set of Digital Object Compliance principles that describes the properties of digital objects
defined under NIH Commons initiative
5. Current practices towards data FAIRness
● Minimum information standards (checklists) specify the minimum amount of
information (metadata) needed for reporting results in a reproducible and
reusable fashion. For example,
○ MIAME: Minimum information about a microarray experiment
○ MIAPE: Minimum Information About a Proteomics Experiment
● Scientific communities have developed templates incorporating detailed
checklists of the metadata needed to describe about the particular types of
experimental data sources.
● Standard identifiers/terminologies/ontologies have been created for different
domains
6.
7. We propose an ontological mapping for the
ImmPort data submission templates.
● Ontology term mapping allows to achieve semantic normalization across
different repositories.
● Ontologically annotated datasets allow context-aware queries and data
integration
● Mapping to controlled vocabularies, relationships and rules facilitates
run-time data validation.
● These help achieve data FAIRness.
8. Ontology mapping of templates
Ontology
Recommender
OBI, OBO, Cell, PR
1
3
2
4
6 5
Incorporate into CEDAR and ImmPort Retrieve annotation (concept Uri, defns, etc)
A collection of ontologies
Expert Verification
Finalizing Mapping
Suggested Alteration
Terms Suggestion
Concept mapper
10. Our mapping strategy
• For certain value sets such as cell populations and cytokines, CM maps
the values to domain specific ontologies such as Cell Ontology (CL) and
Protein Ontology (PR)
• For other elements, CM maps them to the terms in Ontology for
Biomedical Investigations (OBI)
• For elements that do not have matches in OBI, we map these elements to
terms in top-ranked ontologies by OBO Foundry
• For elements that do not have any ontology term matches, we perform
manual search in Bioportal and other available repos for these missing
terms.
• We work closely with individual ontology groups (e.g., CL, OBI) to fill the
11. Template elements mapped to ontologies
• Assay types (e.g., gene expression, flow cytometry, ELISA,
HAI, Luminex )
• Template types (e.g., human subject, biosample)
• Column names (e.g., biosample type, measurement
technique)
• Value sets (e.g., set of cell populations, set of measurement
techniques)
12. Assay Type # Templates # Sub-Templates # Concept # Value Set
Microarray gene
expression
6 10 113 209
Flowcytometry 6 - 67 262
ELISA 2 - 39 602
HAI 2 - 37 117
Luminex 7 - 102 1032
General 6 - 115 190
Mapping Statistics
13. OBI
OBI
OBI
Newly added
A device that moves charged particles through a .... OBI_0001121
A cytometry assay in which the presence of molecules OBI_0002115
14. CEDAR helps to generate ontology-linked metadata
Use case: CEDAR immunology data submission
templates
15. CEDAR has employed our suggested mapping
Map to cell term
in cell ontology
Manual Mapping to “assay”
In OBI Automatic mapping with NCIT
https://cedar.metadatacenter.net
Automatic mapping with OBI
16. Future plan
• Refine mapping of new assay types with updated
algorithm.
• Mapping of clinical metadata with ontology terms.
• Incorporate our ontology-term mapping approach into
CEDAR and ImmPort
• Submit missing terms to relevant ontologies (e.g., OBI)
17. Acknowledgment
• ImmPort
• Jeff Wiser, Patrick Dunn
• Yale
• Hailong Meng, Subhasis Mohanty
•Cell Ontology
• Alex Diehl
• NCBO BioPortal and CEDAR
• Mark Musen, John Graybeal, Martin O’connor
• OBI
• Bjoern Peters