Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond

Towards an Environmental
Health Sciences Ontology:
CHEAR to HHEAR and Beyond
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor of Computer, Cognitive, and Web Science
Director RPI Web Science Research Center
RPI Institute for Data Exploration and Application Health Informatics Lead
dlm@cs.rpi.edu , @dlmcguinness
Computable Exposures Workshop September 9, 2019

CHEAR Ontology Effort
2
The Children’s Health Exposure Analysis Resource, or CHEAR, is a program funded
by the National Institute of Environmental Health Sciences to advance understanding
about how the environment impacts children’s health and development over the
course of a lifetime.
https://chearprogram.org/
Children’s Health Exposure
Analysis Resource (CHEAR)
McGuinness 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
CHEAR is composed of three components:
A National Exposure Assessment Laboratory Network, providing both targeted
and untargeted environmental exposure and biological response analyses in
human samples
A Data Repository, Analysis, and Science Center, providing statistical services,
a data repository, and data standards for integration and sharing
A Coordinating Center, connecting the research community to CHEAR
resources

3
The NIEHS is establishing an infrastructure, the Human Health Exposure Analysis
Resource (HHEAR) as a continuation of the Children's Health Exposure Analysis
Resource (CHEAR). The goal of this consortium is to provide the research
community access to laboratory and statistical analyses to add or expand the
inclusion of environmental exposures in their research and to make that data publicly
available as a means to improve our knowledge of the comprehensive effects of
environmental exposures on human health throughout the life course.
Human Health Exposure Analysis Resource (HHEAR): Data Repository, Analysis and
Science Center
https://grants.nih.gov/grants/guide/rfa-files/rfa-es-18-014.html
Human Health Exposure
Analysis Resource (HHEAR)

4
Goal: Encode terminology currently needed by the CHEAR Data Center
Portal, publish an open source extensible ontology integrating general
exposure science and health leveraging best in class terminologies.
Enabling Findable, Accessible, Interoperable, Reusable Data and
Services to support data analysis and interdisciplinary research
Ontologies encode terms and their interrelationships, providing a foundation
for understanding interoperability and reusability (I and R in terms of FAIR)
Ontology-enabled infrastructures - Knowledge Graphs and Ontology-
enabled search services also provide support for finding and accessing
relevant content (the F and A in FAIR)
Child Health Exposure
Analysis Resource Ontology
Stingone, Mervish, Kovatch, McGuinness, Gennings, Teitelbaum. Big and Disparate Data: Considerations for
Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017. doi:
10.1097/MOP.0000000000000467. PMID: 28134706

Ontology Development Process
Use Cases
Existing Ontologies
& Vocabularies
Expert Interviews
Labkey,
Ontology
Fragments
Ontology
Curation
(ongoing)
Reviewers & Curators
* Ontology Development Team
* Domain collaborators
* Invited experts
"Consumers" (data analysts)
* Semi-automated critiques
Knowledge Graph
Integration
* Linking data and
metadata content to
domain terms
* Linking workflows
based on semantic
descriptions
Repository
Integration
* Source Datasets
* Analytics source
code
* Results
* Publications
Knowledge-
Enhanced
Search
Finding what
is there that
might be of
use
Semantic
Extract
Transform,
Load
(SETLr)
Expert
Guidance
Sources
Data Reporting
Templates
Data Dictionaries /
Codebooks
Foundational
Ontologies/Vocabularies
Interactive
Ontology
Browser with
Annotation
Generated
Ontology
* domain concepts
* authoritative
vocabularies
* vetted definitions
* supporting citations
Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …
Exemplified by CHEAR
McGuinness 9/19 https://chearprogram.org/
Extracted
Vocabularies, …

Ontology Foundations
6
Use Cases help scope and
prioritize
Key Components
• Summary
• Usage Scenario
• Flow of Events
• Activity Diagram
• Competency Questions
• Resources
• See examples at
Ontology Engineering
https://docs.google.com/document/d/1A2w-
xoN5aRwlSoCTEtDsWjs2caYDRD5bANif6icDS6k/edit?usp=sharing
6
Starting with the Use Case
McGuinness 9/19

Ontology Foundations
7
Imported Ontologies:
● Semantic Science Integrated
Ontology (SIO)
● PROV-O
● Units Ontology
● Human-Aware Science Ontology
(HAScO)
● Virtual Solar Terrestrial
Observatory (Instruments)
(VSTO-I)
● Environment Ontology (ENVO)
● …
Minimum Information to Reference an
External Ontology Term (MIREOT)-ed
Ontologies:
● Chemicals of Biological Interest (CheBI)
● Statistics Ontology (STAT-O)
● PubChem
● UBERON (Anatomy)
● Disease Ontology (DO)
● UniProt (Proteins)
● Cogat (Cognitive Measures)
● ExO
● RefMet, …
Annotations:
● Simple Knowledge Organization System
(SKOS)
● Dublin Core (DC) Terms
7
CHEAR Ontology
Foundations and Reuse
McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McCusker, Rashid, Liang, Liu, Chastain,
Pinheiro, Stingone, McGuinness. Broad,
Interdisciplinary Science In Tela: An Exposure
and Child Health Ontology. In Proceedings of
Web Science, 2017. Troy, NY. 349-357.

Mapping Data to Meaning:
Semantic Data Dictionaries
Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary Approach to
Data Annotation and Integration. Enabling Open Semantic Science, in Proc of the International Web
Science Conference Knowledge Graphs Workshop Oct 21, 2017
McGuinness 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Content Pipeline
9
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01 and IBM AI Horizons Network
 Increasing levels of automation
 Reproducible, traceable
transformations from diverse sources
 Transformation to rigorously modeled
knowledge
 Versioning supported by provenance

10
• Ontology support for
mapping and integration
(e.g., education level)
• Ontology informs decisions
about variables that may be
combined, serve as proxy,
or used to derive desired
info (e.g., birth outcomes)
• Ontology Integrity
constraints may help flag
errors (e.g., APGAR > 10)
• Ontology helps expose
implicit information and find
links
Fenton Z-Score
Sex
Birth
weight
Gest
Age
Mother’s Highest
Education Level
Val
Did not attend school 0
Elementary school 1
Technical post-primary 2
Middle school 3
Technical post-middle
school 4
Highschool or junior
college 5
Technical post-junior
college 6
College 7
Graduate 8
Doesn’t know 9
Mother
Education
Val
Less than High
School 0
High School
Graduate or More 1
Support Browsing,Searching, Pooling
Deriving Values, Verification, …
McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McGuinness 9/19

Selection Options Abound…
McGuinness 9/19 Spreadsheet with McCusker Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
https://docs.google.com/spreadsheets/d/1ZSIl_4XTa78ZIqWJaKr-dFUNEChdU_5OScNdP4ALkxQ/edit#gid=41122928

Guidance Criteria
McGuinness 9/19 Spreadsheet with McCusker Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
https://docs.google.com/spreadsheets/d/1ZSIl_4XTa78ZIqWJaKr-dFUNEChdU_5OScNdP4ALkxQ/edit#gid=707101447

Automatic ingest,
access control, data
governance,
precision download,
…
Supports Search study,
data sample, subject, ...
Enables smart
queries e.g., find
Child:BirthWt, Gender,
Gestational Age at Birth
Mother:Age, BMI “early
in pregnancy based on
inclusion criterion for
the particular study”,
Parity, Education
Metals: As, CD, Mn,
Mo, Pb
Ontology-Enabled CHEAR Human
Aware Data Acquisition Framework
McGuinness 12/18
Pinheiro, Santos, Liang, Liu, Rashid, McGuinness, Bax. HADatAc: A Framework for Scientific
Data Integration using Ontologies. Intl Semantic Web Conference, Monterey, CA, 2018.
Sample Question: Gennings
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Ontology-Enabled Study Search /
Precision Data Search and Download
Blood Biomarkers for Children’s Health (Study 1)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Urine Biomarkers for Children’s Health (Study 2)
Institution:
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Metabolomic Biomarkers for Children’s Health (Study 3)
Institution:
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
McGuinness, Pinheiro et al, 9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Ontology Evolution Strategy
Identify terms
that can be
mapped to
existing ontology
Identify terms
to be added
to ontology
Describe new
terms w/
definitions and
location within
existing ontology
Mappings (e.g.
variable names)
incorporated into
knowledge graph
Data into
knowledge graph
after embargo
period
Incorporate new
terms into
existing ontology
Review and
revise updates
with stakeholders
Data Structures
& Standards
Working Group
Compile new
terms across
multiple studies
(e.g. Quarterly)
Data Center
New
version
Ontology
X
McGuinness with Stingone Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Evolution Plans
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• HHEAR to start soon (Sept 1)
• HHEAR needs to be backwards compatible with
CHEAR
• Much of CHEAR is not CHEAR or HHEAR
specific and in fact is reused by other health
informatics efforts
• Plans to extract a “reusable core” aimed at
environmental health sciences needs

Status
McGuinness et al, 9/9/19 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• CHEAR available:
• http://www.hadatac.org/chear-ontology/
• http://www.hadatac.org/ont/chear/
• https://bioportal.bioontology.org/ontologies/CH
EAR
• One more release expected (before HHEAR)
• CHEAR namespace will work however we will
encourage using the HHEAR namespace
• Refactoring coming

Some TakeAways
McGuinness Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
• Large Community Effort
• Vocabulary Designed initially by Semantics
Experts
• Processes co-designed by domain experts and
semantics experts
• Vocabulary evolution managed by the community
• Leverages MANY community vocabularies
• Has evolving prioritization AND criteria for choice
• Supports a wide range of services that we claim
would not be supported without interlinked,
human supervised vocabulary

Community Conversation
McGuinness 9/19
• We would like users , collaborators, requirements
providers, etc.
• We aim to follow the principles in the Ontology
Engineering book
• Questions? : dlm@cs.rpi.edu
Thanks to many: RPI Tetherless World team particularly
McCusker, Erickson, Hendler, Pinheiro, Rashid, Liang, Liu,
Chastain, Chari; RPI: Bennett, Dyson, Seneviratne; Mount Sinai
particularly Teitelbaum, Stingone, Mervish, Gennings, Kovatch;
IBM, particularly Das, Chen, Brown, ….
Funding: NIEHS 0255-0236-4609 / 1U2CES026555-01, DARPA
HR0011-16-2-0030, IBM-RPI HEALS AI Horizons Network, NSF
ACI-1640840, National Spectral Consortium, …

Discussion
• Taxonomies/Ontologies enable FAIR (Findable,
Accessible, Interoperable, Reusable) Data
Resources
• Use ontology-enabled architectures
• Do NOT build taxonomies/ ontologies from scratch
• Selectively and thoughtfully reuse existing best
practice ontologies/vocabularies
• Leverage others mappings and selection criteria
where possible
• Engage experts in choosing ontology portions and
in designing the knowledge architecture
• Ecosystems and diverse teams are critical for
success – community driven and maintained
ontology-based systems are the future
• Flexible, Provenance, and Certainty-Aware
Knowledge Graphs are also the future
DLM@CS.RPI
.EDU
McGuinness

What is an Ontology?
An ontology specifies a rich description of the
• Terminology, concepts, nomenclature
• Relationships among concepts and individuals
• Sentences distinguishing concepts, refining
definitions & relationships
relevant to a particular domain or area of interest.
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann
McGuinness 6/7/2017
• "Pull" for Ontologies. Invited
talk. Semantics for the Web.
Dagstuhl, Germany, 2000.
• Ontologies Come of Age.
Fensel, Hendler, Lieberman,
Wahlster, eds. Spinning the
Semantic Web: Bringing the
World Wide Web to Its Full
Potential. MIT Press, 2003.
McGuinness 12/18

23
Text Mining
Linked Data
Biomedical Databases
Omics and Epidemiology Datasets
File uploads of any kind
Fully automated
Reproducible and traceable from diverse
data types:
Tabular CSV & XLS, XML, JSON, HTML,
etc.
Transformed into rigorously modeled
knowledge:
Meaning as structure, global IDs,
foundational ontologies
Tracked and versioned using provenance
standards
RPI knowledge graph technology
curates from diverse sources
McGuinness 12/18 with McCusker et al. partially supported through IBM AI Horizons Network

Whyis Knowledge Graph
Framework Current Usage
McCusker, McGuinness 12/18
– HEALS Project: clinical guidelines, cancer restaging, etc.
– Nanomaterials “genome” – NanoMine to MetaMine
– Radio Spectrum Policies
– Biology Knowledge Graph
– Knowledge Graph Catalog
– …. Your use here

McGuinness: Historical Perspective
(Semantics, Standards, Semantically-enabled
applications,…
McGuinness 11/18
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness,
Welty, Uschold, Gruninger, Lehmann

Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond

Similar to Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond (20)

More from Deborah McGuinness

More from Deborah McGuinness (12)

Recently uploaded

Recently uploaded (20)