The document discusses the researcher's work using data integration methods to better understand complex diseases like obesity, asthma, and cancer that disproportionately impact racial and gender groups. Over the past 8 years, their research has focused on applying the endotype identification in complex diseases (EICD) framework to identify disease subtypes using different types of omics data along with established disease indicators. This approach has identified potential biomarkers for childhood asthma and is currently being used to study a type of childhood obesity related to maternal factors to help predict risk and identify biomarkers. The EICD framework is presented as a generalizable method for mechanistic disease subtype discovery that could help predict, diagnose, and precisely treat chronic diseases.
13 Assessing Current Approaches to Childhood Immunizatio
Data integration methods for identifying endotypes and biomarkers of complex diseases
1. The era of lower cost high throughput omic data technologies and electronic medical records
provides a rich environment for data driven discoveries in disparities of chronic complex
diseases such as obesity. However, the large scale and size, complexity, and heterogeneous
nature of the data present both practical and conceptual challenges in the integration of these
data types. However, data integration methods are being developed at an unprecedented rate.
My primary research interest is in the application, development, and specification of data
integration methods to better understand the prevention, early identification, and treatment of
complex and chronic diseases with racial and sex disparities.
Over the past 8 years, I have focused on the application, development, and specification of data
integration methods to better understand mechanisms and the occurrence of childhood
asthma, childhood obesity, breast cancer and cardio-metabolic diseases. While many
approaches exist, my research has consistently focused on the endotype identification in
complex diseases (EICD) knowledge discovery framework (Williams-DeVane 2016 in
preparation), where the goal is to identify data driven subtypes of complex and/or chronic
diseases that lead to mechanistic understanding and/or probable biomarkers of disease. In an
iterative manner, we analysis different omic datatypes using established disease indicators, i.e.
Body Mass Index (BMI) for obesity, to determine the complex disease status and then apply
traditional statistical analysis methods to identify probable biomarkers. However, due to the
nature of large scale heterogeneous human subject data there is rarely enough power to
confidently identify biomarkers of disease. However, as a first step in knowledge discovery we
can learn about the data types contribution to the disease and eliminate invariant variables
from the analysis. We then explore alternative and/or race and sex adjusted disease indicators
and repeat the statistical analysis with the goal of
identifying improved disease indicators. Next, we
explore additional data types independently. At any
time in the knowledge discovery process, endotype
identification methods can be applied to identify
subtypes of the complex disease. As more data
becomes available indicators of disease improve ,
more characteristics of the data type are learned,
endotypes improve, mechanistic understanding is
gained, and probable biomarkers of disease are
identified.
The first application of this knowledge discovery was during my postdoctoral fellowship at the
Environmental Protection Agency in Dr. Stephen Edwards research program to identify
endotypes of childhood asthma. Several endotypes of childhood asthma were identified in a
mostly African American population of adolescents leading to insights about multi-omic data
integration (Williams-DeVane et al. 2014) as well as probable biomarkers of childhood asthma
(George et al. 2015). Our current application of the full EICD knowledge discovery paradigm
has been applied to a specific type of childhood obesity that we have coined Maternal
Mediated Childhood Obesity (MMCO) where the goal is to better predict childhood obesity
based on the epigenetics of cord blood, maternal environmental variables, and early childhood
2. growth patterns. In addition, metabolomic and microbiomic data types are considered.
Throughout the EICD knowledge discovery framework, we have developed tools specific to the
use of electronic medical records, MonoInc (Josey et al. 2016 in preparation), identified
preliminary sex and race adjusted disease indicators of MMCO (Williams-DeVane et al. 2016 in
preparation), and have started to identify endotypes of MMCO that will eventually lead to the
identification of probable biomarkers of disease.
The EICD knowledge discovery framework is a generalizable framework that can be applied to
many diseases and complex problems. At the core of the method is deep phenotyping and
deep learning to identify mechanistically distinct subtypes, endotypes, of complex and/or
chronic diseases. The applications of these methods are endless and have the potential to lead
to more affordable healthcare through the prediction, early identification, and precise
treatment of chronic diseases particularly in disparate populations.