The document discusses the need for informatics methods, databases, and standards to support exposome-driven discovery research in a similar way that informatics has supported genomic research. Specifically, it notes that estimates of heritability from twin studies indicate that environmental factors likely play an equally important role as genetics in many traits/diseases. However, the chemical space of the exposome is large and heterogeneous, posing challenges to integrate exposome, genome, and phenome data through approaches like exposome-wide association studies.
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Informatics and data analytics to support for exposome-based discovery
1. Informatics and data analytics to support
exposome-based discovery
Perspectives from a NIEHS workshop
Chirag J Patel
International Society of Exposure Science
Henderson, NV (by way of Boston, MA)
10/20/15
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
2. Arjun Manrai (Harvard)*
Yuxia Cui (NIEHS)
Pierre Bushel (NIEHS)
Molly Hall (Penn State, now U Penn)*
Spyros Karakitsios(Aristotle U, Greece)
Carolyn Mattingly (NCSU)
Marylyn Ritchie (Geisinger Health/Penn State)
Charles Schmitt (NIEHS)
Denis Sarigiannis (Aristotle U, Greece)
Duncan Thomas (USC)
David Wishart (U Alberta, Canada)
David Balshaw (NIEHS)
The workgroup discussed informatics capability for
high-throughput exposome research
(late 2014 to early 2015)
3. We are now in the era of high-throughput
biology and biomedicine.
(now possible to assay thousands to millions of datapoints today)
4. We are now in the era of high-throughput
biology and biomedicine: examples of genomic advances
genetic arrays
gene expression
common genetic variants
epigenome (methylation)
whole genome sequencing (WGS)
full genome sequencing
mRNA-seq
epigenome (3D, histone)
3 x 109 nucleotidebases
3-4 x 104 genes
106 to 107 variants
5. Informatics has enabled discovery in genomics investigations.
1. infrastructure/standards,
2. analytics,
3. databases
7. Analytic methods have enabled discovery in genomics
(example: genome-wide association [GWAS])
A search engine for genetic influence in phenotypes
Genome-wide association studies (GWASs)
A RT I C L E S
13 autosomal loci exceeded the threshold for genome-wide significance (r2 < 0.05), and conditional analyses (see below) establish these SNPs
50 Locus established previously
Locus identified by current study
Locus not confirmed by current study
BCL11A
THADA
NOTCH2
ADAMTS9
IRS1
IGF2BP2
WFS1
ZBED3
CDKAL1
HHEX/IDE
KCNQ1 (2 signals*: )
TCF7L2
KCNJ11
CENTD2
MTNR1B
HMGA2 ZFAND6
PRC1
FTO
HNF1B DUSP9
Conditional analysis
Unconditional analysis
TSPAN8/LGR5
HNF1A
CDC123/CAMK1D
CHCHD9
CDKN2A/2B
SLC30A8
TP53INP1
JAZF1
KLF14
PPAR
40
30
–log10(P)–log10(P)
20
10
10
1 2 3 4 5 6 7 8
Chromosome
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
0
0
Suggestive statistical association (P < 1 10
–5
)
Association in identified or established region (P < 1 10
–4
)
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
Voight et al, Nature Genetics 2012
N=8K T2D, 39K Controls
GWAS in Type 2 Diabetes
8. 758,000 individuals
>400 studies
>>1B datapoints (genotypes and phenotypes)
>950 manuscripts (Paltoo et al., Nature Genetics 2014)
Accessible data repositories have enabled discovery in genomics
investigation:
(ex: Databases of Genotypes and Phenotypes)
9. We claim that there is need for informatics analytic methods,
databases, and standards for the exposome-driven discovery.
EWAS akin to GWAS?
14. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
Type 2 Diabetes (25%)
Heart Disease (25-30%)
Autism (50%???)
Gaugler et al, Nature Genetics (2014)
15. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
H2 estimates for complex traits are low and variable:
massive opportunity for high-throughput E research
H2 < 50%
17. What is the potential chemical (external and internal) space of the
exposome?: perhaps on the order of thousands.
>84,000
TSCA and EPA Inventory
(2014)
>13,000
Davis et al
Comparative Tox DB (2015)
3,600 + 1,634
Toxic Exposome Database
Wishart et al (2015)
toxicants drugs
100-1,000?
uBiome
18. What will the exposome data structure look like?:
a high-dimensioned 3D matrix of (1) exposure measurements
on (2) individuals as a function of (3) time
tim
e
exposome
pollutants
diet
m
etabolites . . .
gut flora
CVD
xenobiotics
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
drugs
integrative
(A) (C)
(B)
exposome
factors
nutrient value for
individual i
individual i
19. What will the exposome data structure look like?:
a high-dimensioned 3D matrix of (1) exposure measurements
on (2) individuals as a function of (3) time
tim
e
exposome
pollutants
diet
m
etabolites . . .
gut flora
CVD BP
can
xenobiotics
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
drugs
integrative
(A) (C)
(B)
longitudinal
system
genome
20. Data-driven investigation for novel exposome factors in the phenome:
Exposome-wide, phenome-wide, and genome-exposome-wide discovery
tim
e
exposome phenome
pollutants
diet
m
etabolites . . .
gut flora
height
w
eight
CVD BP
T2D
cancer
xenobiotics . . .
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
tim
e
drugs
integrative
mixtures of
phenotypes
(A) (C)
(B)
Informatics methods to integrate heterogeneous data (E, G, and P)
and to conduct EWAS, GxEWAS, and PheWAS
EWAS
PheWAS
21. Integration challenges in conducting
data-driven investigation for novel exposome factors in the phenome:
The exposome is heterogenous and G does not equal E.
platform
scale
time-dependent
type
correlation
mass-spec: targeted vs. untargeted
external vs. internal
sampling and life trajectories
continuous vs. categorical
dense!
22. Interdependencies of the exposome:
Correlation globes paint a dense and complex view of exposure
JAMA 2015
Pac Symp Biocomput. 2015
25. Triglycerides
Total Cholesterol
LDL-cholesterol
Trunk Fat
Albumin, urine
Insulin
Total Fat
Head Circumference
Blood urea nitrogen
Albumin
Homocysteine
C-peptide: SI
C-reactive protein
Body Mass Index
Ferritin
Thigh Circumference
Maximal Calf Circumference
Direct HDL-Cholesterol
Total calcium
Total bilirubin
Red cell distribution width
Gamma glutamyl transferase
Mean cell volume
Mean cell hemoglobin
White blood cell count
Uric acid
Protoporphyrin
Hemoglobin
Total protein
Alkaline phosphotase
Waist Circumference
Hematocrit
Weight
Standing Height
1/Creatinine
Creatinine
Trunk Lean excl BMC
Methylmalonic acid
Triceps Skinfold
Lymphocyte number
Subscapular Skinfold
Total Lean excl BMC
Segmented neutrophils number
Lactate dehydrogenase LDH
Bone alkaline phosphotase
TIBC, Frozen Serum
Aspartate aminotransferase AST
Phosphorus
Lumber Pelvis BMD
Glycohemoglobin
Globulin
Chloride
Bicarbonate
Alanine aminotransferase ALT
60 sec. pulse:
Upper Leg Length
Total BMD
Potassium
Glucose, serum
Glucose, plasma
Red blood cell count
Lumber Spine BMD
Platelet count SI
MCHC
Osmolality
Monocyte number
mean systolic
Lymphocyte percent
Segmented neutrophils percent
Recumbent Length
Eosinophils number
Monocyte percent
Head BMD
mean diastolic
Prostate specific antigen ratio
60 sec HR
Basophils number
Sodium
PSA, free
Mean platelet volume
Eosinophils percent
PSA. total
Basophils percent
0 10 20 30 40
R^2 * 100
1 to 66 exposures identified for 81
phenotypes
Additive effect of E factors:
Describe less than 10% of variability in P
(On average: 8%)
Stan Shaw, Hugues Aschard, JP Ioannidis
σ2
E?
Exposome may enable
realization of
remainder of P (>40%)
Recall: H2 <= 50%
26. What do we do now?
Recommendations from the workgroup
27. Data workgroup recommendation highlights
Comprehensive catalog of documented environmental associations
(e.g., risk, variance explained) to strengthen case for exposome.
Where is evidence robust (e.g., air pollution and CVD)?
Where do we see non-replication?
Where is heritability low and ripe for exposome?
Identify technologies that can measure the exposome.
Targeted and untargeted metabolomics.
28. Develop high-throughput data analytic capability.
Statistical methodologies for the 3D matrix!
Encourage a shift from 1 E to many Es.
Link external and internal exposome measures.
Data workgroup recommendation highlights
tim
e
exposome phenome
pollutants
diet
m
etabolites . . .
gut flora
height
w
eight
CVD BP
T2D
cancer
xenobiotics . . .
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
tim
e
drugs
integrative
mixtures of
phenotypes
(A) (C)
(B)
Develop data repositories to house and disseminate individual-level
exposome data.
Assess the variability of the exposome in diverse populations
29. Data workgroup recommendation highlights
Identify data standards for exposome research.
Develop data standards to enable the re-use of research to build
large exposome-rich cohorts.
Identify analytics standards for reproducible research.
Software libraries and tools to share methods and findings.
Incentivize other parties (e.g., researchers, funders, and industry) to
integrate the exposome in their existing programs.
30. Data workgroup recommendation highlights
Educate.
Identify example datasets (e.g., NHANES, DEMOCOPHES).
Hackathons and challenges to recruit data scientists.
Develop big data training support (e.g., K awards) directed at
exposome-related research
32. Informatics will enable us to decipher the role of the emerging
exposome in phenotypes to capture the missing σ2
P
σ2
P = σ2
G + σ2
E
33. Arjun Manrai (Harvard)*
Yuxia Cui (NIEHS)
Pierre Bushel (NIEHS)
Molly Hall (Penn State, now Penn)*
Spyros Karakitsios(Aristotle U, Greece)
Carolyn Mattingly (NCSU)
Marylyn Ritchie (Geisinger/Penn State)
Charles Schmitt (NIEHS)
Denis Sarigiannis (Aristotle U, Greece)
Duncan Thomas (USC)
David Wishart (U Alberta, Canada)
David Balshaw (NIEHS)
Thanks again to the group:
Funded in part by the NIEHS.