SlideShare a Scribd company logo
1 of 61
Download to read offline
Repurposing large datasets to dissect
exposomic (and genomic) contributions in
health and disease
Chirag J Patel

CDC Office of Public Health Genomics 

2/22/16
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
P = G + EType 2 Diabetes

Cancer

Alzheimer’s

Gene expression
Phenotype Genome
Variants
Environment
Infectious agents

Nutrients

Pollutants

Drugs
We are great at G investigation!
over 2400 

Genome-wide Association Studies (GWAS)

https://www.ebi.ac.uk/gwas/
G
Nothing comparable to elucidate E influence!
E: ???
We lack high-throughput methods
and data to discover new E in P…
A similar paradigm for discovery should exist

for E!
Why?
σ2
P = σ2
G + σ2
E
σ2
G
σ2P
H2 =
Heritability (H2) is the range of phenotypic
variability attributed to genetic variability in a
population
Indicator of the proportion of phenotypic
differences attributed to G.
Height is an example of a heritable trait:

Francis Galton shows how its done (1887)
“mid-height of 205 parents
described 60% of variability of 928
offspring”
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for burdensome diseases are low and variable:
massive opportunity for high-throughput E discovery
Type 2 Diabetes
Heart Disease
Autism (50%???)
Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com
G estimates for complex traits are low and variable:
massive opportunity for high-throughput E discovery
σ2
E : Exposome!
©2015NatureAmerica,Inc.Allrightsreserved.
Despite a century of research on complex traits in humans, the
relative importance and specific nature of the influences of
genes and environment on human traits remain controversial.
We report a meta-analysis of twin correlations and reported
variance components for 17,804 traits from 2,748 publications
including 14,558,903 partly dependent twin pairs, virtually
all published twin studies of complex traits. Estimates of
heritability cluster strongly within functional domains,
and across all traits the reported heritability is 49%. For a
majority (69%) of traits, the observed twin correlations are
consistent with a simple and parsimonious model where twin
resemblance is solely due to additive genetic variation. The
data are inconsistent with substantial influences from shared
environment or non-additive genetic variation. This study
provides the most comprehensive analysis of the causes of
individual differences in human traits thus far and will guide
future gene-mapping efforts. All the results can be visualized
using the MaTCH webtool.
Specifically, the partitioning of observed variability into underlying
genetic and environmental sources and the relative importance of
additive and non-additive genetic variation are continually debated1–5.
Recent results from large-scale genome-wide association studies
(GWAS) show that many genetic variants contribute to the variation
in complex traits and that effect sizes are typically small6,7. However,
the sum of the variance explained by the detected variants is much
smaller than the reported heritability of the trait4,6–10. This ‘missing
heritability’ has led some investigators to conclude that non-additive
variation must be important4,11. Although the presence of gene-gene
interaction has been demonstrated empirically5,12–17, little is known
about its relative contribution to observed variation18.
In this study, our aim is twofold. First, we analyze empirical esti-
mates of the relative contributions of genes and environment for
virtually all human traits investigated in the past 50 years. Second, we
assess empirical evidence for the presence and relative importance of
non-additive genetic influences on all human traits studied. We rely
on classical twin studies, as the twin design has been used widely
to disentangle the relative contributions of genes and environment,
across a variety of human traits. The classical twin design is based
on contrasting the trait resemblance of monozygotic and dizygotic
twin pairs. Monozygotic twins are genetically identical, and dizygotic
twins are genetically full siblings. We show that, for a majority of traits
(69%), the observed statistics are consistent with a simple and parsi-
monious model where the observed variation is solely due to additive
genetic variation. The data are inconsistent with a substantial influence
from shared environment or non-additive genetic variation. We also
show that estimates of heritability cluster strongly within functional
domains, and across all traits the reported heritability is 49%. Our
results are based on a meta-analysis of twin correlations and reported
variance components for 17,804 traits from 2,748 publications includ-
ing 14,558,903 partly dependent twin pairs, virtually all twin studies of
complex traits published between 1958 and 2012. This study provides
the most comprehensive analysis of the causes of individual differences
in human traits thus far and will guide future gene-mapping efforts. All
Meta-analysis of the heritability of human traits based on
fifty years of twin studies
Tinca J C Polderman1,10, Beben Benyamin2,10, Christiaan A de Leeuw1,3, Patrick F Sullivan4–6,
Arjen van Bochoven7, Peter M Visscher2,8,11 & Danielle Posthuma1,9,11
1Department of Complex Trait Genetics, VU University, Center for Neurogenomics
and Cognitive Research, Amsterdam, the Netherlands. 2Queensland Brain
Institute, University of Queensland, Brisbane, Queensland, Australia. 3Institute
for Computing and Information Sciences, Radboud University Nijmegen,
Nijmegen, the Netherlands. 4Center for Psychiatric Genomics, Department
of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA.
5Department of Psychiatry, University of North Carolina, Chapel Hill, North
Carolina, USA. 6Department of Medical Epidemiology and Biostatistics,
Karolinska Institutet, Stockholm, Sweden. 7Faculty of Sciences, VU University,
Insight into the nature of observed variation in human traits is impor-
tant in medicine, psychology, social sciences and evolutionary biology.
It has gained new relevance with both the ability to map genes for
human traits and the availability of large, collaborative data sets to do
so on an extensive and comprehensive scale. Individual differences in
human traits have been studied for more than a century, yet the causes
of variation in human traits remain uncertain and controversial.
Nature Genetics, 2015
17,804 traits of the phenome
2,748 publications

14,558,903 twin pairs
Average H2 (genome): 0.49
Exposome may play an equal role.
It took a new paradigm of GWAS for discovery:
Human Genome Project to GWAS
Sequencing of the genome
2001
HapMap project:
http://hapmap.ncbi.nlm.nih.gov/
Characterize common variation
2001-current day
High-throughput variant
assay
< $99 for ~1M variants
Measurement tools
~2003 (ongoing)
ARTICLES
Genome-wide association study of 14,000
cases of seven common diseases and
3,000 shared controls
The Wellcome Trust Case Control Consortium*
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the
identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip
500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major
diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at
P , 5 3 1027
: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1
diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these
signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found
compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a
25 27
Vol 447|7 June 2007|doi:10.1038/nature05911
WTCCC, Nature, 2008.
Comprehensive, high-throughput analyses
GWAS
Explaining the other 50%:
A big data-driven paradigm for robust discovery of
E in disease via EWAS and the exposome
what to measure? how to measure?
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
“A more comprehensive view of
environmental exposure is
needed ... to discover major
causes of diseases...”
how to analyze in relation to health?
Wild, 2005

Rappaport and Smith, 2010, 2011

Buck-Louis and Sundaram 2012

Miller and Jones, 2014

Patel CJ and Ioannidis JPAI, 2014
What is a Genome-Wide Association Study (GWAS)?:
Data-driven search for G factors in P
evolut
partic
eases;
tase 1)
well a
biolog
The
captur
implem
STRU
revert
subset
librium
clearly
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
rvedteststatistic
a
b
NATURE|Vol 447|7 June 2007
WTCCC, 2007
AA Aa aa
case
control
Robust, transparent, and comprehensive search for G in P
evolu
parti
eases
tase 1
well
biolo
Th
captu
imple
STRU
rever
subse
libriu
clearl
−log10(P)
0
5
10
15
Chromosome
22
X
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
80
60
40
100
ervedteststatistic
a
b
NATURE|Vol 447|7 June 2007
comprehensive
and transparent
multiplicity
controlled
novel
findings
(and validated)
Patel CJ, Ioannidis JPAI, JAMA 2014
Patel CJ, Ioannidis JPAI, JECH 2014
Why carry out a Genome-Wide Association Study:
Analytically robust, transparent, and comprehensive 

search for G in P
GWAS example
Example of the big data paradigm:

GWAS to drives discovery in G in P
A RT I C L E S
50 Locus established previously
Locus identified by current study
Locus not confirmed by current study
BCL11A
THADA
NOTCH2
ADAMTS9
IRS1
IGF2BP2
WFS1
ZBED3
CDKAL1
HHEX/IDE
KCNQ1 (2 signals*: )
TCF7L2
KCNJ11
CENTD2
MTNR1B
HMGA2 ZFAND6
PRC1
FTO
HNF1B DUSP9
Conditional analysis
Unconditional analysis
TSPAN8/LGR5
HNF1A
CDC123/CAMK1D
CHCHD9
CDKN2A/2B
SLC30A8
TP53INP1
JAZF1
KLF14
PPAR
40
30
–log10(P)–log10(P)
20
10
10
1 2 3 4 5 6 7 8
Chromosome
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
0
0
Suggestive statistical association (P < 1 10
–5
)
Association in identified or established region (P < 1 10
–4
)
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
Voight et al, Nature Genetics 2012

N=8K T2D, 39K Controls

Impossible to reach this scale in E based investigations
Connecting E with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
Examples of exposome-driven discovery machinery
Gold standard for breadth of exposure & behavior data:
National Health and Nutrition Examination Survey
Nutrients and Vitamins

vitamin D, carotenes
Infectious Agents

hepatitis, HIV, Staph. aureus
Plastics and consumables

phthalates, bisphenol A
Physical Activity

e.g., stepsPesticides and pollutants

atrazine; cadmium; hydrocarbons
Drugs

statins; aspirin
What E are associated with all-cause mortality and 

telomere length?
How does it work?:
Searching for exposures and behaviors associated with all-
cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
Cox proportional hazards
baseline exposure and time to death
False discovery rate < 5%
NHANES: 2003-2004
N=177 to 3258 (20-202 deaths)
~2.8 years of followup
p < 0.05
Int J Epidem. 2013
Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS in All-cause mortality:
253 exposure/behavior associations in survival
Multivariate Cox (age, sex, income, education, race/ethnicity, occupation [in
red])
FDR < 5%
sociodemographics
replicated factor
Int J Epidem. 2013
Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:

Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
Multivariate cox (age, sex, income, education, race/ethnicity, occupation [in red])
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
physical activity
[low, moderate, high activity]*
*derived from METs per activity and categorized by Health.gov guidelines
R2 ~ 2%
452 associations in Telomere Length:
Polychlorinated biphenyls associated with longer telomeres?!
IJE, in press
0
1
2
3
4
−0.2 −0.1 0.0 0.1 0.2
effect size
−log10(pvalue)
PCBs
FDR<5%
Trunk Fat
Alk. PhosCRP
Cadmium
Cadmium (urine)cigs per day
retinyl stearate
R2 ~ 1%
VO2 Maxpulse rate
shorter telomeres longer telomeres
adjusted by age, age2, race, poverty, education, occupation
median N=3000; N range: 300-7000
Samples exposed to PCBs associated with difference in genes

implicated in telomere length GWAS?
Expression differences for 24 GWAS implicated genes
Queried the Gene Expression Omnibus for PCBs

Affymetrix human arrays (GPL570)

7 gene expression experiments on humans

52 exposed; 14 unexposed
Differential gene expression and a functional analysis of PCB-exposed children:
Understanding disease and disorder development
Sisir K. Dutta a,
⁎, Partha S. Mitra a,1
, Somiranjan Ghosh a,1
, Shizhu Zang a,1
, Dean Sonneborn b
,
Irva Hertz-Picciotto b
, Tomas Trnovec c
, Lubica Palkovicova c
, Eva Sovcikova c
,
Svetlana Ghimbovschi d
, Eric P. Hoffman d
a
Molecular Genetics Laboratory, Howard University, Washington, DC, USA
b
Department of Public Health Sciences, University of California Davis, Davis, CA, USA
c
Slovak Medical University, Bratislava, Slovak Republic
d
Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA
a b s t r a c ta r t i c l e i n f o
Article history:
Received 20 December 2010
Accepted 10 July 2011
The goal of the present study is to understand the probable molecular mechanism of toxicities and the
associated pathways related to observed pathophysiology in high PCB-exposed populations. We have
performed a microarray-based differential gene expression analysis of children (mean age 46.1 months) of
Environment International 40 (2012) 143–154
Contents lists available at ScienceDirect
Environment International
journal homepage: www.elsevier.com/locate/envint
IJE, in press
Suggestive, but need more N!
0
1
2
−0.50 −0.25 0.00 0.25 0.50 0.75
log(difference)
−log10(pvalue)
1555203_s_at (SLC44A4)
1555203_s_at (MYNN)
224206_x_at (MYNN)
Could PCBs influence expression of genes

implicated in telomere length GWAS?
myoneurin

bladder, leukemia, colorectal cancer GWASs
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
How can we study the elusive environment in larger scale for
biomedical discovery?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
metabolomics, proteomics, and biosensors.7
Eventually, patterns of
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
ofdatafordiscovery.JustaswiththeGeneExpressionOmnib
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•bioinformatics to connect exposome with phenome
•new ‘omics technologies to measure the exposome
•dense correlations

•reverse causality
•confounding
•(longitudinal) publicly available data
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
Red: positive ρ

Blue: negative ρ

thickness: |ρ|
for each pair of E:

Spearman ρ

(575 factors: 81,937 correlations)
permuted data to produce

“null ρ”

sought replication in > 1
cohort
Pac Symp Biocomput. 2015

JECH. 2015
Red: positive ρ

Blue: negative ρ

thickness: |ρ|
for each pair of E:

Spearman ρ

(575 factors: 81,937 correlations)
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
permuted data to produce

“null ρ”

sought replication in > 1
cohort
Pac Symp Biocomput. 2015

JECH. 2015
Effective number of
variables:

500 (10% decrease)
Telomere Length All-cause mortality
http://bit.ly/globebrowse
Interdependencies of the exposome:
Telomeres vs. all-cause mortality
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
How can we study the elusive environment in larger scale for
biomedical discovery?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
metabolomics, proteomics, and biosensors.7
Eventually, patterns of
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
ofdatafordiscovery.JustaswiththeGeneExpressionOmnib
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•bioinformatics to connect exposome with phenome
•new ‘omics technologies to measure the exposome
•dense correlations

•reverse causality
•confounding
•(longitudinal) publicly available data
BD2K Patient-Centered Information Commons
Integrated repositories of individual-level information
PI: Isaac Kohane
http://pic-sure.org
with Paul Avillach, Michael McDuffie, Jeremy Easton-Marks, 

Cartik Saravanamuthu and the BD2K PIC-SURE team
NHANES 1999-2006

API available now

http://bit.ly/nhanes_pici
BD2K Patient-Centered Information Commons
NHANES exposome browser
THE PRECISION MEDICINE INITIATIVE
WHAT IS IT?
Precision medicine is an emerging approach for disease
prevention and treatment that takes into account people’s
individual variations in genes, environment, and lifestyle.
The Precision Medicine Initiative will generate the
scientific evidence needed to move the concept of
precision medicine into clinical practice.
WHY NOW?
The time is right because of:
Sequencing
of the human
genome
Improved
technologies for
biomedical analysis
New tools
for using large
datasets
NEAR TERM GOALS
Intensify efforts to apply precision medicine to cancer.http://www.nih.gov/precisionmedicine
Committee on A Framework for Developing a
New Taxonomy of Disease
Board on Life Sciences
Division on Earth and Life Studies
NRC, National Academy of Sciences 2011
The use of multiple molecular parameters to
characterize disease [P] may lead to a more
accurate and find-grained classification of
disease [P]…
“multiple molecular parameters” must include E!
P
We are many phenotypes simultaneously:

Can we better categorize these P?
Body Measures

Body Mass Index

Height
Blood pressure & fitness

Systolic BP

Diastolic BP

Pulse rate

VO2 Max
Metabolic

Glucose

LDL-Cholesterol

Triglycerides
Inflammation

C-reactive protein

white blood cell count
Kidney function

Creatinine

Sodium

Uric Acid
Liver function

Aspartate aminotransferase

Gamma glutamyltransferase
Aging

Telomere length
Creation of a phenotype-exposure association map:
A 2-D view of 83 phenotype by 252 exposure associations
> 0
< 0
Association Size:
Clusters of exposures associated with clusters of phenotypes?
252 biomarkers of exposure × 83 clinical trait phenotypes 

NHANES 1999-2000, 2001-2002, 2005-2006

~21K regressions: replicated significant (FDR < 5%) in 2003-2004

adjusted by age, age2, sex, race, income, chronic disease

Hugues Aschard, JP Ioannidis
83phenotypes
252 exposures
Alpha-carotene
Alcohol
VitaminEasalpha-tocopherol
Beta-carotene
Caffeine
Calcium
Carbohydrate
Cholesterol
Copper
Beta-cryptoxanthin
Folicacid
Folate,DFE
Foodfolate
Dietaryfiber
Iron
Energy
Lycopene
Lutein+zeaxanthin
MFA16:1
MFA18:1
MFA20:1
Magnesium
Totalmonounsaturatedfattyacids
Moisture
Niacin
PFA18:2
PFA18:3
PFA20:4
PFA22:5
PFA22:6
Totalpolyunsaturatedfattyacids
Phosphorus
Potassium
Protein
Retinol
SFA4:0
SFA6:0
SFA8:0
SFA10:0
SFA12:0
SFA14:0
SFA16:0
SFA18:0
Selenium
Totalsaturatedfattyacids
Totalsugars
Totalfat
Theobromine
VitaminA,RAE
Thiamin
VitaminB12
Riboflavin
VitaminB6
VitaminC
VitaminK
Zinc
NoSalt
OrdinarySalt
a-Carotene
VitaminB12,serum
trans-b-carotene
cis-b-carotene
b-cryptoxanthin
Folate,serum
g-tocopherol
Iron,FrozenSerum
CombinedLutein/zeaxanthin
trans-lycopene
Folate,RBC
Retinylpalmitate
Retinylstearate
Retinol
VitaminD
a-Tocopherol
Daidzein
o-Desmethylangolensin
Equol
Enterodiol
Enterolactone
Genistein
EstimatedVO2max
PhysicalActivity
Doesanyonesmokeinhome?
Total#ofcigarettessmokedinhome
Cotinine
CurrentCigaretteSmoker?
Agelastsmokedcigarettesregularly
#cigarettessmokedperdaywhenquit
#cigarettessmokedperdaynow
#dayssmokedcigsduringpast30days
Avg#cigarettes/dayduringpast30days
Smokedatleast100cigarettesinlife
Doyounowsmokecigarettes...
numberofdayssincequit
Usedsnuffatleast20timesinlife
drink5inaday
drinkperday
days5drinksinyear
daysdrinkinyear
3-fluorene
2-fluorene
3-phenanthrene
1-phenanthrene
2-phenanthrene
1-pyrene
3-benzo[c]phenanthrene
3-benz[a]anthracene
Mono-n-butylphthalate
Mono-phthalate
Mono-cyclohexylphthalate
Mono-ethylphthalate
Mono-phthalate
Mono--hexylphthalate
Mono-isobutylphthalate
Mono-n-methylphthalate
Mono-phthalate
Mono-benzylphthalate
Cadmium
Lead
Mercury,total
Barium,urine
Cadmium,urine
Cobalt,urine
Cesium,urine
Mercury,urine
Iodine,urine
Molybdenum,urine
Lead,urine
Platinum,urine
Antimony,urine
Thallium,urine
Tungsten,urine
Uranium,urine
BloodBenzene
BloodEthylbenzene
Bloodo-Xylene
BloodStyrene
BloodTrichloroethene
BloodToluene
Bloodm-/p-Xylene
1,2,3,7,8-pncdd
1,2,3,7,8,9-hxcdd
1,2,3,4,6,7,8-hpcdd
1,2,3,4,6,7,8,9-ocdd
2,3,7,8-tcdd
Beta-hexachlorocyclohexane
Gamma-hexachlorocyclohexane
Hexachlorobenzene
HeptachlorEpoxide
Mirex
Oxychlordane
p,p-DDE
Trans-nonachlor
2,5-dichlorophenolresult
2,4,6-trichlorophenolresult
Pentachlorophenol
Dimethylphosphate
Diethylphosphate
Dimethylthiophosphate
PCB66
PCB74
PCB99
PCB105
PCB118
PCB138&158
PCB146
PCB153
PCB156
PCB157
PCB167
PCB170
PCB172
PCB177
PCB178
PCB180
PCB183
PCB187
3,3,4,4,5,5-hxcb
3,3,4,4,5-pncb
3,4,4,5-tcb
Perfluoroheptanoicacid
Perfluorohexanesulfonicacid
Perfluorononanoicacid
Perfluorooctanoicacid
Perfluorooctanesulfonicacid
Perfluorooctanesulfonamide
2,3,7,8-tcdf
1,2,3,7,8-pncdf
2,3,4,7,8-pncdf
1,2,3,4,7,8-hxcdf
1,2,3,6,7,8-hxcdf
1,2,3,7,8,9-hxcdf
2,3,4,6,7,8-hxcdf
1,2,3,4,6,7,8-hpcdf
Measles
Toxoplasma
HepatitisAAntibody
HepatitisBcoreantibody
HepatitisBSurfaceAntibody
HerpesII
Albumin, urine
Uric acid
Phosphorus
Osmolality
Sodium
Potassium
Creatinine
Chloride
Total calcium
Bicarbonate
Blood urea nitrogen
Total protein
Total bilirubin
Lactate dehydrogenase LDH
Gamma glutamyl transferase
Globulin
Alanine aminotransferase ALT
Aspartate aminotransferase AST
Alkaline phosphotase
Albumin
Methylmalonic acid
PSA. total
Prostate specific antigen ratio
TIBC, Frozen Serum
Red cell distribution width
Red blood cell count
Platelet count SI
Segmented neutrophils percent
Mean platelet volume
Mean cell volume
Mean cell hemoglobin
MCHC
Hemoglobin
Hematocrit
Ferritin
Protoporphyrin
Transferrin saturation
White blood cell count
Monocyte percent
Lymphocyte percent
Eosinophils percent
C-reactive protein
Segmented neutrophils number
Monocyte number
Lymphocyte number
Eosinophils number
Basophils number
mean systolic
mean diastolic
60 sec. pulse:
60 sec HR
Total Cholesterol
Triglycerides
Glucose, serum
Insulin
Homocysteine
Glucose, plasma
Glycohemoglobin
C-peptide: SI
LDL-cholesterol
Direct HDL-Cholesterol
Bone alkaline phosphotase
Trunk Fat
Lumber Pelvis BMD
Lumber Spine BMD
Head BMD
Trunk Lean excl BMC
Total Lean excl BMC
Total Fat
Total BMD
Weight
Waist Circumference
Triceps Skinfold
Thigh Circumference
Subscapular Skinfold
Recumbent Length
Upper Leg Length
Standing Height
Head Circumference
Maximal Calf Circumference
Body Mass Index
-0.4 -0.2 0 0.2 0.4
Value
050100150
Color Key
and Histogram
Count
http://bit.ly.com/pemap
phenotypes
exposures
+-
nutrients
BMI,weight,
BMD
metabolic
renalfunction
pcbs
metabolic
bloodparameters
hydrocarbons
Creation of a phenotype-exposure association map:
A 2-D view of connections between P and E
Body Mass Index
Waist circumference
Trunk fat
Total fat
Weight
Total lean fat
Thigh circumference
Calf circumference
Trunk Lean
Skinfold
CRP
Trans-b-carotene
a-carotene
cis-b-carotene
b-cryptoxanthin
lutein/xeaxanthin
VitaminD
Magnesium
Folate
Vo2Max
PCB180
Cotinine
100cigs
Ciginlast30
Cadmium
Benzene
Toluene
Smokeinhome?
Styrene
Currentsmoker
3-fluorene
2-fluorene
White blood cell count
Segmented neutrophils
Monocyte number
Lymphocyte number
Eosinophils number
Basophils number
Alkaline phosphotase
Homocysteine
Hemoglobin
Pulse rate
http://bit.ly.com/pemap
EWAS-derived phenotype-exposure association map:
Zooming in to WBC and BMI phenotype clusters
Alpha-carotene
Alcohol
VitaminEasalpha-tocopherol
Beta-carotene
Caffeine
Calcium
Carbohydrate
Cholesterol
Copper
Beta-cryptoxanthin
Folicacid
Folate,DFE
Foodfolate
Dietaryfiber
Iron
Energy
Lycopene
Lutein+zeaxanthin
MFA16:1
MFA18:1
MFA20:1
Magnesium
Totalmonounsaturatedfattyacids
Moisture
Niacin
PFA18:2
PFA18:3
PFA20:4
PFA22:5
PFA22:6
Totalpolyunsaturatedfattyacids
Phosphorus
Potassium
Protein
Retinol
SFA4:0
SFA6:0
SFA8:0
SFA10:0
SFA12:0
SFA14:0
SFA16:0
SFA18:0
Selenium
Totalsaturatedfattyacids
Totalsugars
Totalfat
Theobromine
VitaminA,RAE
Thiamin
VitaminB12
Riboflavin
VitaminB6
VitaminC
VitaminK
Zinc
NoSalt
OrdinarySalt
a-Carotene
VitaminB12,serum
trans-b-carotene
cis-b-carotene
b-cryptoxanthin
Folate,serum
g-tocopherol
Iron,FrozenSerum
CombinedLutein/zeaxanthin
trans-lycopene
Folate,RBC
Retinylpalmitate
Retinylstearate
Retinol
VitaminD
a-Tocopherol
Daidzein
o-Desmethylangolensin
Equol
Enterodiol
Enterolactone
Genistein
EstimatedVO2max
PhysicalActivity
Doesanyonesmokeinhome?
Total#ofcigarettessmokedinhome
Cotinine
CurrentCigaretteSmoker?
Agelastsmokedcigarettesregularly
#cigarettessmokedperdaywhenquit
#cigarettessmokedperdaynow
#dayssmokedcigsduringpast30days
Avg#cigarettes/dayduringpast30days
Smokedatleast100cigarettesinlife
Doyounowsmokecigarettes...
numberofdayssincequit
Usedsnuffatleast20timesinlife
drink5inaday
drinkperday
days5drinksinyear
daysdrinkinyear
3-fluorene
2-fluorene
3-phenanthrene
1-phenanthrene
2-phenanthrene
1-pyrene
3-benzo[c]phenanthrene
3-benz[a]anthracene
Mono-n-butylphthalate
Mono-phthalate
Mono-cyclohexylphthalate
Mono-ethylphthalate
Mono-phthalate
Mono--hexylphthalate
Mono-isobutylphthalate
Mono-n-methylphthalate
Mono-phthalate
Mono-benzylphthalate
Cadmium
Lead
Mercury,total
Barium,urine
Cadmium,urine
Cobalt,urine
Cesium,urine
Mercury,urine
Iodine,urine
Molybdenum,urine
Lead,urine
Platinum,urine
Antimony,urine
Thallium,urine
Tungsten,urine
Uranium,urine
BloodBenzene
BloodEthylbenzene
Bloodo-Xylene
BloodStyrene
BloodTrichloroethene
BloodToluene
Bloodm-/p-Xylene
1,2,3,7,8-pncdd
1,2,3,7,8,9-hxcdd
1,2,3,4,6,7,8-hpcdd
1,2,3,4,6,7,8,9-ocdd
2,3,7,8-tcdd
Beta-hexachlorocyclohexane
Gamma-hexachlorocyclohexane
Hexachlorobenzene
HeptachlorEpoxide
Mirex
Oxychlordane
p,p-DDE
Trans-nonachlor
2,5-dichlorophenolresult
2,4,6-trichlorophenolresult
Pentachlorophenol
Dimethylphosphate
Diethylphosphate
Dimethylthiophosphate
PCB66
PCB74
PCB99
PCB105
PCB118
PCB138&158
PCB146
PCB153
PCB156
PCB157
PCB167
PCB170
PCB172
PCB177
PCB178
PCB180
PCB183
PCB187
3,3,4,4,5,5-hxcb
3,3,4,4,5-pncb
3,4,4,5-tcb
Perfluoroheptanoicacid
Perfluorohexanesulfonicacid
Perfluorononanoicacid
Perfluorooctanoicacid
Perfluorooctanesulfonicacid
Perfluorooctanesulfonamide
2,3,7,8-tcdf
1,2,3,7,8-pncdf
2,3,4,7,8-pncdf
1,2,3,4,7,8-hxcdf
1,2,3,6,7,8-hxcdf
1,2,3,7,8,9-hxcdf
2,3,4,6,7,8-hxcdf
1,2,3,4,6,7,8-hpcdf
Measles
Toxoplasma
HepatitisAAntibody
HepatitisBcoreantibody
HepatitisBSurfaceAntibody
HerpesII
Albumin, urine
Uric acid
Phosphorus
Osmolality
Sodium
Potassium
Creatinine
Chloride
Total calcium
Bicarbonate
Blood urea nitrogen
Total protein
Total bilirubin
Lactate dehydrogenase LDH
Gamma glutamyl transferase
Globulin
Alanine aminotransferase ALT
Aspartate aminotransferase AST
Alkaline phosphotase
Albumin
Methylmalonic acid
PSA. total
Prostate specific antigen ratio
TIBC, Frozen Serum
Red cell distribution width
Red blood cell count
Platelet count SI
Segmented neutrophils percent
Mean platelet volume
Mean cell volume
Mean cell hemoglobin
MCHC
Hemoglobin
Hematocrit
Ferritin
Protoporphyrin
Transferrin saturation
White blood cell count
Monocyte percent
Lymphocyte percent
Eosinophils percent
C-reactive protein
Segmented neutrophils number
Monocyte number
Lymphocyte number
Eosinophils number
Basophils number
mean systolic
mean diastolic
60 sec. pulse:
60 sec HR
Total Cholesterol
Triglycerides
Glucose, serum
Insulin
Homocysteine
Glucose, plasma
Glycohemoglobin
C-peptide: SI
LDL-cholesterol
Direct HDL-Cholesterol
Bone alkaline phosphotase
Trunk Fat
Lumber Pelvis BMD
Lumber Spine BMD
Head BMD
Trunk Lean excl BMC
Total Lean excl BMC
Total Fat
Total BMD
Weight
Waist Circumference
Triceps Skinfold
Thigh Circumference
Subscapular Skinfold
Recumbent Length
Upper Leg Length
Standing Height
Head Circumference
Maximal Calf Circumference
Body Mass Index
-0.4 -0.2 0 0.2 0.4
Value
050100150
Color Key
and Histogram
Count
+-
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
7 6 5 4 3 2 1 0
Distance
liver:Albumin
kidney:Bicarbonate
immunological:Basophils percent
immunological:Lymphocyte percent
immunological:Eosinophils percent
kidney:Phosphorus
liver:Total protein
liver:Aspartate aminotransferase AST
liver:Alanine aminotransferase ALT
body measures:Head Circumference
body measures:Recumbent Length
liver:Lactate dehydrogenase LDH
cancer:Prostate specific antigen ratio
cancer:PSA, free
blood:Transferrin saturation
liver:Total bilirubin
heart:Direct HDL-Cholesterol
immunological:Monocyte percent
bone:Head BMD
body measures:Standing Height
body measures:Upper Leg Length
bone:Total BMD
bone:Lumber Spine BMD
bone:Lumber Pelvis BMD
heart:Triglycerides
heart:LDL-cholesterol
heart:Total Cholesterol
blood:MCHC
blood:TIBC, Frozen Serum
blood:Hematocrit
blood:Hemoglobin
kidney:Potassium
blood:Mean cell hemoglobin
blood:Mean cell volume
kidney:Uric acid
kidney:Blood urea nitrogen
kidney:Total calcium
kidney:Creatinine
blood:Ferritin
blood:Red blood cell count
body measures:Weight
blood:Segmented neutrophils percent
body measures:Total Lean excl BMC
body measures:Trunk Lean excl BMC
body measures:Body Mass Index
body measures:Waist Circumference
body measures:Triceps Skinfold
body measures:Maximal Calf Circumference
body measures:Thigh Circumference
liver:Gamma glutamyl transferase
blood pressure:60 sec. pulse:
metabolic:Insulin
body measures:Total Fat
body measures:Trunk Fat
body measures:Subscapular Skinfold
blood pressure:mean systolic
immunological:C-reactive protein
liver:Globulin
immunological:Monocyte number
immunological:Segmented neutrophils number
immunological:Lymphocyte number
immunological:White blood cell count
immunological:Basophils number
immunological:Eosinophils number
blood:Mean platelet volume
heart:Homocysteine
nutrition:Methylmalonic acid
kidney:Osmolality
kidney:Chloride
kidney:Sodium
kidney:Albumin, urine
blood pressure:60 sec HR
cancer:PSA. total
blood:Platelet count SI
blood:Protoporphyrin
blood:Red cell distribution width
bone:Bone alkaline phosphotase
liver:Alkaline phosphotase
blood pressure:mean diastolic
metabolic:C-peptide: SI
metabolic:Glycohemoglobin
metabolic:Glucose, plasma
metabolic:Glucose, serum
inflammation
adiposity
kidney function
metabolic traits
7 6 5 4 3 2 1 0
Distance
liver:Albumin
kidney:Bicarbonate
immunological:Basophils percent
immunological:Lymphocyte percent
immunological:Eosinophils percent
kidney:Phosphorus
liver:Total protein
liver:Aspartate aminotransferase AST
liver:Alanine aminotransferase ALT
body measures:Head Circumference
body measures:Recumbent Length
liver:Lactate dehydrogenase LDH
cancer:Prostate specific antigen ratio
cancer:PSA, free
blood:Transferrin saturation
liver:Total bilirubin
heart:Direct HDL-Cholesterol
immunological:Monocyte percent
bone:Head BMD
body measures:Standing Height
body measures:Upper Leg Length
bone:Total BMD
bone:Lumber Spine BMD
bone:Lumber Pelvis BMD
heart:Triglycerides
heart:LDL-cholesterol
heart:Total Cholesterol
blood:MCHC
blood:TIBC, Frozen Serum
blood:Hematocrit
blood:Hemoglobin
kidney:Potassium
blood:Mean cell hemoglobin
blood:Mean cell volume
kidney:Uric acid
kidney:Blood urea nitrogen
kidney:Total calcium
kidney:Creatinine
blood:Ferritin
blood:Red blood cell count
body measures:Weight
blood:Segmented neutrophils percent
body measures:Total Lean excl BMC
body measures:Trunk Lean excl BMC
body measures:Body Mass Index
body measures:Waist Circumference
body measures:Triceps Skinfold
body measures:Maximal Calf Circumference
body measures:Thigh Circumference
liver:Gamma glutamyl transferase
blood pressure:60 sec. pulse:
metabolic:Insulin
body measures:Total Fat
body measures:Trunk Fat
body measures:Subscapular Skinfold
blood pressure:mean systolic
immunological:C-reactive protein
liver:Globulin
immunological:Monocyte number
immunological:Segmented neutrophils number
immunological:Lymphocyte number
immunological:White blood cell count
immunological:Basophils number
immunological:Eosinophils number
blood:Mean platelet volume
heart:Homocysteine
nutrition:Methylmalonic acid
kidney:Osmolality
kidney:Chloride
kidney:Sodium
kidney:Albumin, urine
blood pressure:60 sec HR
cancer:PSA. total
blood:Platelet count SI
blood:Protoporphyrin
blood:Red cell distribution width
bone:Bone alkaline phosphotase
liver:Alkaline phosphotase
blood pressure:mean diastolic
metabolic:C-peptide: SI
metabolic:Glycohemoglobin
metabolic:Glucose, plasma
metabolic:Glucose, serum
“bad” cholesterol
“good” cholesterol
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
7 6 5 4 3 2 1 0
Distance
liver:Albumin
kidney:Bicarbonate
immunological:Basophils percent
immunological:Lymphocyte percent
immunological:Eosinophils percent
kidney:Phosphorus
liver:Total protein
liver:Aspartate aminotransferase AST
liver:Alanine aminotransferase ALT
body measures:Head Circumference
body measures:Recumbent Length
liver:Lactate dehydrogenase LDH
cancer:Prostate specific antigen ratio
cancer:PSA, free
blood:Transferrin saturation
liver:Total bilirubin
heart:Direct HDL-Cholesterol
immunological:Monocyte percent
bone:Head BMD
body measures:Standing Height
body measures:Upper Leg Length
bone:Total BMD
bone:Lumber Spine BMD
bone:Lumber Pelvis BMD
heart:Triglycerides
heart:LDL-cholesterol
heart:Total Cholesterol
blood:MCHC
blood:TIBC, Frozen Serum
blood:Hematocrit
blood:Hemoglobin
kidney:Potassium
blood:Mean cell hemoglobin
blood:Mean cell volume
kidney:Uric acid
kidney:Blood urea nitrogen
kidney:Total calcium
kidney:Creatinine
blood:Ferritin
blood:Red blood cell count
body measures:Weight
blood:Segmented neutrophils percent
body measures:Total Lean excl BMC
body measures:Trunk Lean excl BMC
body measures:Body Mass Index
body measures:Waist Circumference
body measures:Triceps Skinfold
body measures:Maximal Calf Circumference
body measures:Thigh Circumference
liver:Gamma glutamyl transferase
blood pressure:60 sec. pulse:
metabolic:Insulin
body measures:Total Fat
body measures:Trunk Fat
body measures:Subscapular Skinfold
blood pressure:mean systolic
immunological:C-reactive protein
liver:Globulin
immunological:Monocyte number
immunological:Segmented neutrophils number
immunological:Lymphocyte number
immunological:White blood cell count
immunological:Basophils number
immunological:Eosinophils number
blood:Mean platelet volume
heart:Homocysteine
nutrition:Methylmalonic acid
kidney:Osmolality
kidney:Chloride
kidney:Sodium
kidney:Albumin, urine
blood pressure:60 sec HR
cancer:PSA. total
blood:Platelet count SI
blood:Protoporphyrin
blood:Red cell distribution width
bone:Bone alkaline phosphotase
liver:Alkaline phosphotase
blood pressure:mean diastolic
metabolic:C-peptide: SI
metabolic:Glycohemoglobin
metabolic:Glucose, plasma
metabolic:Glucose, serum
height + BMD
Toward a phenotype-exposure association map:
(Re)-categorizing phenotypes with E
Triglycerides
Total Cholesterol
LDL-cholesterol
Trunk Fat
Albumin, urine
Insulin
Total Fat
Head Circumference
Blood urea nitrogen
Albumin
Homocysteine
C-peptide: SI
C-reactive protein
Body Mass Index
Ferritin
Thigh Circumference
Maximal Calf Circumference
Direct HDL-Cholesterol
Total calcium
Total bilirubin
Red cell distribution width
Gamma glutamyl transferase
Mean cell volume
Mean cell hemoglobin
White blood cell count
Uric acid
Protoporphyrin
Hemoglobin
Total protein
Alkaline phosphotase
Waist Circumference
Hematocrit
Weight
Standing Height
1/Creatinine
Creatinine
Trunk Lean excl BMC
Methylmalonic acid
Triceps Skinfold
Lymphocyte number
Subscapular Skinfold
Total Lean excl BMC
Segmented neutrophils number
Lactate dehydrogenase LDH
Bone alkaline phosphotase
TIBC, Frozen Serum
Aspartate aminotransferase AST
Phosphorus
Lumber Pelvis BMD
Glycohemoglobin
Globulin
Chloride
Bicarbonate
Alanine aminotransferase ALT
60 sec. pulse:
Upper Leg Length
Total BMD
Potassium
Glucose, serum
Glucose, plasma
Red blood cell count
Lumber Spine BMD
Platelet count SI
MCHC
Osmolality
Monocyte number
mean systolic
Lymphocyte percent
Segmented neutrophils percent
Recumbent Length
Eosinophils number
Monocyte percent
Head BMD
mean diastolic
Prostate specific antigen ratio
60 sec HR
Basophils number
Sodium
PSA, free
Mean platelet volume
Eosinophils percent
PSA. total
Basophils percent
0 10 20 30 40
R^2 * 100
1 to 66 exposures identified for 81
phenotypes

Additive effect of E factors:

Describe < 20% of variability in P
(On average: 8%)
σ2E?
Recall: Avg(h2) = 50%

Long road ahead to capture σ2
P
Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
Example of fragmentation:
Is everything we eat associated with cancer?
Schoenfeld and Ioannidis, AJCN 2012
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
Of 50, 40 studied in cancer risk
Weak statistical evidence:

non-replicated

inconsistent effects

non-standardized
e modelling
oblem is akin to – but less well
sed and more poorly understood than –
e testing. For example, consider the use
r regression to adjust the risk levels of
atments to the same background level
There can be many covariates, and
t of covariates can be in or out of the
With ten covariates, there are over 1000
models. Consider a maze as a metaphor
elling (Figure 3). The red line traces the
path out of the maze. The path through
ze looks simple, once it is known.
ways in the literature for dealing with model
selection, so we propose a new, composite
2. Publication bias
is general recognition that a paper
much better chance of acceptance if
hing new is found. This means that, for
ation, the claim in the paper has to
sed on a p-value less than 0.05. From
g’s point of view5
, this is quality by
tion. The journals are placing heavy
ce on a statistical test rather than
nation of the methods and steps that
o a conclusion. As to having a p-value
han 0.05, some might be tempted to
the system10
through multiple testing,
ple modelling or unfair treatment of
or some combination of the three that
to a small p-value. Researchers can be
creative in devising a plausible story to
statistical finding.
2 The data cleaning team creates a
modelling data set and a holdout set and
P < 0.05
Figure 3. The path through a complex process can appear quite simple once the path is defined. Which terms are
included in a multiple linear regression model? Each turn in a maze is analogous to including or not a specific
term in the evolving linear model. By keeping an eye on the p-value on the term selected to be at issue, one
can work towards a suitably small p-value. © ktsdesign – Fotolia
A maze of associations is one way to a fragmented
literature and Vibration of Effects
Young, 2011
univariate
sex
sex & age
sex & race
sex & race & age
JCE, 2015
Distribution of associations and p-values due to model choice:
Estimating the Vibration of Effects (or Risk)
Variable of Interest
e.g., 1 SD of log(serum Vitamin D)
Adjusting Variable Set
n=13
All-subsets Cox regression
213+ 1 = 8,193 models
SES [3rd tertile]
education [>HS]
race [white]
body mass index [normal]
total cholesterol
any heart disease
family heart disease
any hypertension
any diabetes
any cancer
current/past smoker [no smoking]
drink 5/day
physical activity
Data Source
NHANES 1999-2004
417 variables of interest
time to death
N≧1000 (≧100 deaths)
effect sizes
p-values
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
1
50
1 50 99
5.0
7.5
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RPvalue = 4.68
A
B
C D
E
median p-value/HR for k
percentile indicator
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90
The Vibration of Effects:
Vitamin D and Thyroxine and attenuated risk in mortality
JCE, 2015
●
●
●
●
●
●
●
●
●
●
●
●
●●
0
1
2
3
4
5
6
7
8
9
10
11
1213
1
50
99
1 50 99
2.5
5.0
7.5
0.64 0.68 0.72 0.76
Hazard Ratio
−log10(pvalue)
Vitamin D (1SD(log))
RHR = 1.14
RP = 4.68
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
50
99
1 50 99
1
2
3
4
0.75 0.80 0.85 0.90
Hazard Ratio
−log10(pvalue)
Thyroxine (1SD(log))
RHR = 1.15
RP = 2.90
JCE, 2015
Janus (two-faced) risk profile
Risk and significance depends on modeling scenario!
The Vibration of Effects: beware of the Janus effect

(both risk and protection?!)
“risk”“protection”
“significant”
Brittanica.com
http://bit.ly/effectvibration
Emerging technologies to ascertain exposome will enable
biomedical discovery
High-throughput E data standards & exposome:

mitigate fragmented literature of associations
Confounding, reverse causality: 

how to handle at large dimension?
e.g., EWASs in telomere length and mortality 

and 81 quantitative phenotypes
Prioritize biological and epidemiological studies.
New ways of measuring P are here now!

Can we use them to assess E (and G)?
physical activity monitors

(fitbit)
smart devices

(iOS)
personal E sensors

(exposome band?!)
propeller health
Now possible to consent thousands of people at the push
of a button! http://researchkit.org
Possible to survey P of diabetics consented through
ResearchKit?
Adam Brown
Stanley Shaw (MGH)

Dennis Ausiello (MGH)
http://bit.ly/glucosuccess
http://bit.ly/glucosuccess
Demographics
age, sex, etc
Diabetes Indicators
Hemoglobin A1C

glucose (fasting, bedtime)
Passive Activity
Motion

Step count
N
~4000 diabetics

186K manual glucose entries

7.6M passive step count entries
Age (years): 43.6

Male %: 80%

Female %: 20%
Race (%):
White: 57%

Black: 7%

Hispanic: 11%

Other: 25%
Education (%):
Some High School: 2%

High School: 8%

Some college: 20%

2-year college: 10%

4 year college: 26%

Post-college: 32%
http://bit.ly/glucosuccess
Mean Years Diabetic: 7.8
GlucoSuccess has captured a unique population quickly

(< 1 year of surveillance)
Comorbidities (CDC*)

Stroke: 2% (0.7%)

Heart Failure: 2% (1%)

High Blood Pressure: 47% (57%)

High Lipids: 36% (58%)

Kidney Disease: 4% (0.2%*)

Circulation problems: 8% (4%)

Eye problems: 9% (17%*)

*end-stage renal disease

*visual impairment
http://www.cdc.gov/diabetes
Body Mass Index: 31
Hemoglobin A1C: 7.7
Is step count on previous day associated with fasting
glucose the next day?: 

mashing up 24K step counts with glucose (N=600)
10000 steps ~ 1.5 mg/dL (random-effects linear model)

p<1x10-16
glucosedayN(mg/dL)
Steps (in 1000s), day N-1
http://bit.ly/glucosuccess
GlucoSuccess-like apps can enable longitudinal and
dynamic surveillance of P
However: population-level differences and generalizability
Possible to (re-)use high-throughput data (exposome, medical
claims, devices) to discover the role of E (and G) in P.
−log10(pvalue)
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ●
acrylamide
allergentest
bacterialinfection
cotinine
diakyl
dioxins
furansdibenzofuran
heavymetals
hydrocarbons
latex
nutrientscarotenoid
nutrientsminerals
nutrientsvitaminA
nutrientsvitaminB
nutrientsvitaminC
nutrientsvitaminD
nutrientsvitaminE
pcbs
perchlorate
pesticidesatrazine
pesticideschlorophenol
pesticidesorganochlorine
pesticidesorganophosphate
pesticidespyrethyroid
phenols
phthalates
phytoestrogens
polybrominatedethers
polyflourochemicals
viralinfection
volatilecompounds
012
A Serum cotinine B Serum total mercury
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
P = G + E
Harvard DBMI
Isaac Kohane

Susanne Churchill

Stan Shaw

Nathan Palmer

Jenn Grandfield

Sunny Alvear

Michal Preminger

Harvard Chan
Hugues Aschard

Francesca Dominici

Chirag J Patel

chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
NIH Common Fund

Big Data to Knowledge
Acknowledgements
CDC
Marta Gwinn
Ridgely Green
Muin Khoury
Denise Lowe
Stanford
John Ioannidis

Atul Butte (UCSF)

U Queensland
Jian Yang

Peter Visscher

Cochrane
Belinda Burford
RagGroup
Chirag Lakhani
Adam Brown
Danielle Rasooly

Arjun Manrai

Erik Corona

Nam Pho

More Related Content

What's hot

EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119Chirag Patel
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Chirag Patel
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slidesChirag Patel
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease Chirag Patel
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopChirag Patel
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019Chirag Patel
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EChirag Patel
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag Patel
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundityChirag Patel
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
Introduction to Network Medicine
Introduction to Network MedicineIntroduction to Network Medicine
Introduction to Network Medicinebrnbarcelona
 
BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine brnmomentum
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathias Hibbard
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryJean-Claude Bradley
 
MathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinalMathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinalMathias Hibbard
 

What's hot (20)

EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119
 
Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701 Intro to Biomedical Informatics 701
Intro to Biomedical Informatics 701
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slides
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data Workshop
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven E
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundity
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Introduction to Network Medicine
Introduction to Network MedicineIntroduction to Network Medicine
Introduction to Network Medicine
 
BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine BRN Seminar 12/06/14 Introduction to Network Medicine
BRN Seminar 12/06/14 Introduction to Network Medicine
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
MathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaperMathiasHibbard_604FinalPaper
MathiasHibbard_604FinalPaper
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Role of Human Genome Project in Medical Science
Role of Human Genome Project in Medical ScienceRole of Human Genome Project in Medical Science
Role of Human Genome Project in Medical Science
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
 
Osmf rnk
Osmf rnkOsmf rnk
Osmf rnk
 
MathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinalMathiasHibbard_655PaperFinal
MathiasHibbard_655PaperFinal
 

Similar to Repurposing large datasets to dissect exposomic contributions in health and disease

Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Chirag Patel
 
Current Directions in PsychologicalScience2015, Vol. 24(4).docx
Current Directions in PsychologicalScience2015, Vol. 24(4).docxCurrent Directions in PsychologicalScience2015, Vol. 24(4).docx
Current Directions in PsychologicalScience2015, Vol. 24(4).docxannettsparrow
 
Human Genetics and Craniofacial Development
Human Genetics and Craniofacial DevelopmentHuman Genetics and Craniofacial Development
Human Genetics and Craniofacial DevelopmentAlwaleed Fahad
 
Nihms379831 stephen quake
Nihms379831 stephen quakeNihms379831 stephen quake
Nihms379831 stephen quake鋒博 蔡
 
Placental gene expression mediates the interaction between obstetrical histor...
Placental gene expression mediates the interaction between obstetrical histor...Placental gene expression mediates the interaction between obstetrical histor...
Placental gene expression mediates the interaction between obstetrical histor...BARRY STANLEY 2 fasd
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorderswef
 
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24Hyung Jin Choi
 
Theory and practice
Theory and practiceTheory and practice
Theory and practiceKinoshy
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...Meningitis Research Foundation
 
Applied computational genomics
Applied computational genomicsApplied computational genomics
Applied computational genomicsSpringer
 
Nucleotide Groupings
Nucleotide GroupingsNucleotide Groupings
Nucleotide GroupingsKara Bell
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가? Hyung Jin Choi
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by thAgripinaBeaulieuyw
 

Similar to Repurposing large datasets to dissect exposomic contributions in health and disease (17)

Search engine for E NEU network science 080817
Search engine for E NEU network science 080817Search engine for E NEU network science 080817
Search engine for E NEU network science 080817
 
Current Directions in PsychologicalScience2015, Vol. 24(4).docx
Current Directions in PsychologicalScience2015, Vol. 24(4).docxCurrent Directions in PsychologicalScience2015, Vol. 24(4).docx
Current Directions in PsychologicalScience2015, Vol. 24(4).docx
 
Parent of origin effect
Parent of origin effectParent of origin effect
Parent of origin effect
 
Human Genetics and Craniofacial Development
Human Genetics and Craniofacial DevelopmentHuman Genetics and Craniofacial Development
Human Genetics and Craniofacial Development
 
Nihms379831 stephen quake
Nihms379831 stephen quakeNihms379831 stephen quake
Nihms379831 stephen quake
 
Placental gene expression mediates the interaction between obstetrical histor...
Placental gene expression mediates the interaction between obstetrical histor...Placental gene expression mediates the interaction between obstetrical histor...
Placental gene expression mediates the interaction between obstetrical histor...
 
Mark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disordersMark Daly - Finding risk genes in psychiatric disorders
Mark Daly - Finding risk genes in psychiatric disorders
 
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24(서울의대 공유용) 빅데이터 분석  유전체 정보와 개인라이프로그 정보 활용-2015_11_24
(서울의대 공유용) 빅데이터 분석 유전체 정보와 개인라이프로그 정보 활용-2015_11_24
 
Theory and practice
Theory and practiceTheory and practice
Theory and practice
 
Genetic factors
Genetic factorsGenetic factors
Genetic factors
 
The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...The emerging picture of host genetic control of susceptibility and outcome in...
The emerging picture of host genetic control of susceptibility and outcome in...
 
Applied computational genomics
Applied computational genomicsApplied computational genomics
Applied computational genomics
 
A genetic model for neurodevelopmental disease
A genetic model for neurodevelopmental diseaseA genetic model for neurodevelopmental disease
A genetic model for neurodevelopmental disease
 
Duzkale_2013_Variant Interpretation_
Duzkale_2013_Variant Interpretation_Duzkale_2013_Variant Interpretation_
Duzkale_2013_Variant Interpretation_
 
Nucleotide Groupings
Nucleotide GroupingsNucleotide Groupings
Nucleotide Groupings
 
헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?헬스케어 빅데이터로 무엇을 할 수 있는가?
헬스케어 빅데이터로 무엇을 할 수 있는가?
 
1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th1- Why was the Tomasetti et al article so misinterpreted by th
1- Why was the Tomasetti et al article so misinterpreted by th
 

Recently uploaded

Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfMedicoseAcademics
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 

Recently uploaded (20)

Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
Russian Call Girls in Delhi Tanvi ➡️ 9711199012 💋📞 Independent Escort Service...
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdfHemostasis Physiology and Clinical correlations by Dr Faiza.pdf
Hemostasis Physiology and Clinical correlations by Dr Faiza.pdf
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic 📞 9907093804 High Profile Service 100% Safe
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 

Repurposing large datasets to dissect exposomic contributions in health and disease

  • 1. Repurposing large datasets to dissect exposomic (and genomic) contributions in health and disease Chirag J Patel CDC Office of Public Health Genomics 2/22/16 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  • 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs
  • 3. We are great at G investigation! over 2400 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  • 4. Nothing comparable to elucidate E influence! E: ??? We lack high-throughput methods and data to discover new E in P…
  • 5. A similar paradigm for discovery should exist for E! Why?
  • 6. σ2 P = σ2 G + σ2 E
  • 7. σ2 G σ2P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  • 8. Height is an example of a heritable trait: Francis Galton shows how its done (1887) “mid-height of 205 parents described 60% of variability of 928 offspring”
  • 9. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for burdensome diseases are low and variable: massive opportunity for high-throughput E discovery Type 2 Diabetes Heart Disease Autism (50%???)
  • 10. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery σ2 E : Exposome!
  • 11. ©2015NatureAmerica,Inc.Allrightsreserved. Despite a century of research on complex traits in humans, the relative importance and specific nature of the influences of genes and environment on human traits remain controversial. We report a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications including 14,558,903 partly dependent twin pairs, virtually all published twin studies of complex traits. Estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. For a majority (69%) of traits, the observed twin correlations are consistent with a simple and parsimonious model where twin resemblance is solely due to additive genetic variation. The data are inconsistent with substantial influences from shared environment or non-additive genetic variation. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All the results can be visualized using the MaTCH webtool. Specifically, the partitioning of observed variability into underlying genetic and environmental sources and the relative importance of additive and non-additive genetic variation are continually debated1–5. Recent results from large-scale genome-wide association studies (GWAS) show that many genetic variants contribute to the variation in complex traits and that effect sizes are typically small6,7. However, the sum of the variance explained by the detected variants is much smaller than the reported heritability of the trait4,6–10. This ‘missing heritability’ has led some investigators to conclude that non-additive variation must be important4,11. Although the presence of gene-gene interaction has been demonstrated empirically5,12–17, little is known about its relative contribution to observed variation18. In this study, our aim is twofold. First, we analyze empirical esti- mates of the relative contributions of genes and environment for virtually all human traits investigated in the past 50 years. Second, we assess empirical evidence for the presence and relative importance of non-additive genetic influences on all human traits studied. We rely on classical twin studies, as the twin design has been used widely to disentangle the relative contributions of genes and environment, across a variety of human traits. The classical twin design is based on contrasting the trait resemblance of monozygotic and dizygotic twin pairs. Monozygotic twins are genetically identical, and dizygotic twins are genetically full siblings. We show that, for a majority of traits (69%), the observed statistics are consistent with a simple and parsi- monious model where the observed variation is solely due to additive genetic variation. The data are inconsistent with a substantial influence from shared environment or non-additive genetic variation. We also show that estimates of heritability cluster strongly within functional domains, and across all traits the reported heritability is 49%. Our results are based on a meta-analysis of twin correlations and reported variance components for 17,804 traits from 2,748 publications includ- ing 14,558,903 partly dependent twin pairs, virtually all twin studies of complex traits published between 1958 and 2012. This study provides the most comprehensive analysis of the causes of individual differences in human traits thus far and will guide future gene-mapping efforts. All Meta-analysis of the heritability of human traits based on fifty years of twin studies Tinca J C Polderman1,10, Beben Benyamin2,10, Christiaan A de Leeuw1,3, Patrick F Sullivan4–6, Arjen van Bochoven7, Peter M Visscher2,8,11 & Danielle Posthuma1,9,11 1Department of Complex Trait Genetics, VU University, Center for Neurogenomics and Cognitive Research, Amsterdam, the Netherlands. 2Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia. 3Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, the Netherlands. 4Center for Psychiatric Genomics, Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA. 5Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA. 6Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 7Faculty of Sciences, VU University, Insight into the nature of observed variation in human traits is impor- tant in medicine, psychology, social sciences and evolutionary biology. It has gained new relevance with both the ability to map genes for human traits and the availability of large, collaborative data sets to do so on an extensive and comprehensive scale. Individual differences in human traits have been studied for more than a century, yet the causes of variation in human traits remain uncertain and controversial. Nature Genetics, 2015 17,804 traits of the phenome 2,748 publications 14,558,903 twin pairs Average H2 (genome): 0.49 Exposome may play an equal role.
  • 12. It took a new paradigm of GWAS for discovery: Human Genome Project to GWAS Sequencing of the genome 2001 HapMap project: http://hapmap.ncbi.nlm.nih.gov/ Characterize common variation 2001-current day High-throughput variant assay < $99 for ~1M variants Measurement tools ~2003 (ongoing) ARTICLES Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium* There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined ,2,000 individuals for each of 7 major diseases and a shared set of ,3,000 controls. Case-control comparisons identified 24 independent association signals at P , 5 3 1027 : 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a 25 27 Vol 447|7 June 2007|doi:10.1038/nature05911 WTCCC, Nature, 2008. Comprehensive, high-throughput analyses GWAS
  • 13. Explaining the other 50%: A big data-driven paradigm for robust discovery of E in disease via EWAS and the exposome what to measure? how to measure? PERSPECTIVES Xenobiotics Inflammation Preexisting disease Lipid peroxidation Oxidative stress Gut flora Internal chemical environment Externalenvironment ExposomeRADIATION DIET POLLUTION INFECTIONS DRUGS LIFE-STYLE STRESS Reactive electrophiles Metals Endocrine disrupters Immune modulators Receptor-binding proteins itical entity for disease eti- ogy (7). Recent discussion as focused on whether and ow to implement this vision 8). Although fully charac- rizing human exposomes daunting, strategies can be eveloped for getting “snap- hots” of critical portions of person’s exposome during ifferent stages of life. At ne extreme is a “bottom-up” rategy in which all chemi- als in each external source f a subject’s exposome are easured at each time point. lthoughthisapproachwould ave the advantage of relat- g important exposures to e air, water, or diet, it would quire enormous effort and ould miss essential compo- ents of the internal chemi- al environment due to such actors as gender, obesity, flammation, and stress. By ontrast, a “top-down” strat- gy would measure all chem- als (or products of their ownstream processing or ffects, so-called read-outs r signatures) in a subject’s ood. This would require nly a single blood specimen each time point and would relate directly ruptors and can be measured through serum some (telomere) length in peripheral blood mono- nuclear cells responded to chronic psychological stress, possibly mediated by the production of reac- tive oxygen species (15). Characterizing the exposome represents a tech- nological challenge like that of thehumangenomeproject,which began when DNA sequencing was in its infancy (16). Analyti- cal systems are needed to pro- cess small amounts of blood from thousands of subjects. Assays should be multiplexed for mea- suring many chemicals in each class of interest. Tandem mass spectrometry, gene and protein chips, and microfluidic systems offer the means to do this. Plat- forms for high-throughput assays shouldleadtoeconomiesofscale, again like those experienced by the human genome project. And because exposome technologies would provide feedback for thera- peuticinterventionsandpersonal- ized medicine, they should moti- vate the development of commer- cial devices for screening impor- tant environmental exposures in blood samples. With successful characterization of both Characterizing the exposome. The exposome represents the combined exposures from all sources that reach the internal chemical environment. Toxicologically important classes of exposome chemicals are shown. Signatures and biomarkers can detect these agents in blood or serum. onOctober21,2010www.sciencemag.orgrom “A more comprehensive view of environmental exposure is needed ... to discover major causes of diseases...” how to analyze in relation to health? Wild, 2005 Rappaport and Smith, 2010, 2011 Buck-Louis and Sundaram 2012 Miller and Jones, 2014 Patel CJ and Ioannidis JPAI, 2014
  • 14. What is a Genome-Wide Association Study (GWAS)?: Data-driven search for G factors in P evolut partic eases; tase 1) well a biolog The captur implem STRU revert subset librium clearly −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 rvedteststatistic a b NATURE|Vol 447|7 June 2007 WTCCC, 2007 AA Aa aa case control Robust, transparent, and comprehensive search for G in P
  • 15. evolu parti eases tase 1 well biolo Th captu imple STRU rever subse libriu clearl −log10(P) 0 5 10 15 Chromosome 22 X 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 80 60 40 100 ervedteststatistic a b NATURE|Vol 447|7 June 2007 comprehensive and transparent multiplicity controlled novel findings (and validated) Patel CJ, Ioannidis JPAI, JAMA 2014 Patel CJ, Ioannidis JPAI, JECH 2014 Why carry out a Genome-Wide Association Study: Analytically robust, transparent, and comprehensive search for G in P
  • 16. GWAS example Example of the big data paradigm: GWAS to drives discovery in G in P A RT I C L E S 50 Locus established previously Locus identified by current study Locus not confirmed by current study BCL11A THADA NOTCH2 ADAMTS9 IRS1 IGF2BP2 WFS1 ZBED3 CDKAL1 HHEX/IDE KCNQ1 (2 signals*: ) TCF7L2 KCNJ11 CENTD2 MTNR1B HMGA2 ZFAND6 PRC1 FTO HNF1B DUSP9 Conditional analysis Unconditional analysis TSPAN8/LGR5 HNF1A CDC123/CAMK1D CHCHD9 CDKN2A/2B SLC30A8 TP53INP1 JAZF1 KLF14 PPAR 40 30 –log10(P)–log10(P) 20 10 10 1 2 3 4 5 6 7 8 Chromosome 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 0 0 Suggestive statistical association (P < 1 10 –5 ) Association in identified or established region (P < 1 10 –4 ) Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta- analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4). Voight et al, Nature Genetics 2012 N=8K T2D, 39K Controls Impossible to reach this scale in E based investigations
  • 17. Connecting E with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  • 18. Examples of exposome-driven discovery machinery
  • 19. Gold standard for breadth of exposure & behavior data: National Health and Nutrition Examination Survey Nutrients and Vitamins vitamin D, carotenes Infectious Agents hepatitis, HIV, Staph. aureus Plastics and consumables phthalates, bisphenol A Physical Activity e.g., stepsPesticides and pollutants atrazine; cadmium; hydrocarbons Drugs statins; aspirin
  • 20. What E are associated with all-cause mortality and telomere length?
  • 21. How does it work?: Searching for exposures and behaviors associated with all- cause mortality. NHANES: 1999-2004 National Death Index linked mortality 246 behaviors and exposures (serum/urine/self-report) NHANES: 1999-2001 N=330 to 6008 (26 to 655 deaths) ~5.5 years of followup Cox proportional hazards baseline exposure and time to death False discovery rate < 5% NHANES: 2003-2004 N=177 to 3258 (20-202 deaths) ~2.8 years of followup p < 0.05 Int J Epidem. 2013
  • 22. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) EWAS in All-cause mortality: 253 exposure/behavior associations in survival Multivariate Cox (age, sex, income, education, race/ethnicity, occupation [in red]) FDR < 5% sociodemographics replicated factor Int J Epidem. 2013
  • 23. Adjusted Hazard Ratio -log10(pvalue) 0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8 02468 1 2 3 4 5 67 1 Physical Activity 2 Does anyone smoke in home? 3 Cadmium 4 Cadmium, urine 5 Past smoker 6 Current smoker 7 trans-lycopene (11) 1 2 3 4 5 6 78 9 10 1112 13 14 1516 1 age (10 year increment) 2 SES_1 3 male 4 SES_0 5 black 6 SES_2 7 SES_3 8 education_hs 9 other_eth 10 mexican 11 occupation_blue_semi 12 education_less_hs 13 occupation_never 14 occupation_blue_high 15 occupation_white_semi 16 other_hispanic (69) EWAS (re)-identifies factors associated with all-cause mortality: Volcano plot of 200 associations age (10 years) income (quintile 2) income (quintile 1) male black income (quintile 3) any one smoke in home? Multivariate cox (age, sex, income, education, race/ethnicity, occupation [in red]) serum and urine cadmium [1 SD] past smoker? current smoker?serum lycopene [1SD] physical activity [low, moderate, high activity]* *derived from METs per activity and categorized by Health.gov guidelines R2 ~ 2%
  • 24. 452 associations in Telomere Length: Polychlorinated biphenyls associated with longer telomeres?! IJE, in press 0 1 2 3 4 −0.2 −0.1 0.0 0.1 0.2 effect size −log10(pvalue) PCBs FDR<5% Trunk Fat Alk. PhosCRP Cadmium Cadmium (urine)cigs per day retinyl stearate R2 ~ 1% VO2 Maxpulse rate shorter telomeres longer telomeres adjusted by age, age2, race, poverty, education, occupation median N=3000; N range: 300-7000
  • 25. Samples exposed to PCBs associated with difference in genes implicated in telomere length GWAS? Expression differences for 24 GWAS implicated genes Queried the Gene Expression Omnibus for PCBs Affymetrix human arrays (GPL570) 7 gene expression experiments on humans 52 exposed; 14 unexposed Differential gene expression and a functional analysis of PCB-exposed children: Understanding disease and disorder development Sisir K. Dutta a, ⁎, Partha S. Mitra a,1 , Somiranjan Ghosh a,1 , Shizhu Zang a,1 , Dean Sonneborn b , Irva Hertz-Picciotto b , Tomas Trnovec c , Lubica Palkovicova c , Eva Sovcikova c , Svetlana Ghimbovschi d , Eric P. Hoffman d a Molecular Genetics Laboratory, Howard University, Washington, DC, USA b Department of Public Health Sciences, University of California Davis, Davis, CA, USA c Slovak Medical University, Bratislava, Slovak Republic d Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA a b s t r a c ta r t i c l e i n f o Article history: Received 20 December 2010 Accepted 10 July 2011 The goal of the present study is to understand the probable molecular mechanism of toxicities and the associated pathways related to observed pathophysiology in high PCB-exposed populations. We have performed a microarray-based differential gene expression analysis of children (mean age 46.1 months) of Environment International 40 (2012) 143–154 Contents lists available at ScienceDirect Environment International journal homepage: www.elsevier.com/locate/envint IJE, in press
  • 26. Suggestive, but need more N! 0 1 2 −0.50 −0.25 0.00 0.25 0.50 0.75 log(difference) −log10(pvalue) 1555203_s_at (SLC44A4) 1555203_s_at (MYNN) 224206_x_at (MYNN) Could PCBs influence expression of genes implicated in telomere length GWAS? myoneurin bladder, leukemia, colorectal cancer GWASs
  • 27. Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- proach (an analogy to genome-wide association stud- ies).Forexample,Wangetal4 screenedmorethan2000 chemicalsinserumtodiscoverendogenousexposuresas- sociated with risk for cardiovascular disease. Therearenotablehurdlesinanalyzing“big”environ- mental data. These same problems affect epidemiology of1-risk-factor-at-a-time,butinEWAStheirprevalencebe- comes more clearly manifest at large scale. When study- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets, may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observational data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of the multiple correlations also highlights the challenge thatinterveningtomodify1putativeriskfactoralsomay inadvertently affect multiple other correlated factors. Even when a seemingly simple intervention is tested in randomizedtrials(affectingasingleriskfactoramongthe manycorrelations),theinterventionisnotreallysimple. In essence what is tested are multiple perturbations of factors correlated with the one targeted for interven- VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion JAMA, 2014 JECH, 2014 Proc Symp Biocomp, 2015 How can we study the elusive environment in larger scale for biomedical discovery? Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observationa data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion High-throughputascertainmentofendogenousindicatorsofen- vironmentalexposurethatmayreflecttheexposomeincreasinglyat- tractattention,andtheirperformanceneedstobecarefullyevaluated. These include chemical detection of indicators of exposure through metabolomics, proteomics, and biosensors.7 Eventually, patterns of US federally funded gene expression experiment data be d itedinpublicrepositoriessuchastheGeneExpressionOmnibu repositoryhasbeeninstrumentalindevelopmentoftechnolo measurement of gene expression, data standardization, and ofdatafordiscovery.JustaswiththeGeneExpressionOmnib Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt Nutrition Examination Survey (NHANES) Participants, 2003-2004 A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Negative correlation Positive correl Infectious agents Pollutants Nutrients and vitamins Demographic attributes Eachcorrelationinterdependencyglobeincludes317environmentalexposures representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and thicknessofeachedgeindicatesthemagnitudeofthecorrelation. Opinion Viewpoint •bioinformatics to connect exposome with phenome •new ‘omics technologies to measure the exposome •dense correlations •reverse causality •confounding •(longitudinal) publicly available data
  • 28. Interdependencies of the exposome: Correlation globes paint a complex view of exposure Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput. 2015 JECH. 2015
  • 29. Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) Interdependencies of the exposome: Correlation globes paint a complex view of exposure permuted data to produce “null ρ” sought replication in > 1 cohort Pac Symp Biocomput. 2015 JECH. 2015 Effective number of variables: 500 (10% decrease)
  • 30. Telomere Length All-cause mortality http://bit.ly/globebrowse Interdependencies of the exposome: Telomeres vs. all-cause mortality
  • 31. Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- proach (an analogy to genome-wide association stud- ies).Forexample,Wangetal4 screenedmorethan2000 chemicalsinserumtodiscoverendogenousexposuresas- sociated with risk for cardiovascular disease. Therearenotablehurdlesinanalyzing“big”environ- mental data. These same problems affect epidemiology of1-risk-factor-at-a-time,butinEWAStheirprevalencebe- comes more clearly manifest at large scale. When study- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets, may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observational data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of the multiple correlations also highlights the challenge thatinterveningtomodify1putativeriskfactoralsomay inadvertently affect multiple other correlated factors. Even when a seemingly simple intervention is tested in randomizedtrials(affectingasingleriskfactoramongthe manycorrelations),theinterventionisnotreallysimple. In essence what is tested are multiple perturbations of factors correlated with the one targeted for interven- VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion JAMA, 2014 JECH, 2014 Proc Symp Biocomp, 2015 How can we study the elusive environment in larger scale for biomedical discovery? Studying the Elusive Environment in Large Scale Itispossiblethatmorethan50%ofcomplexdiseaserisk isattributedtodifferencesinanindividual’senvironment.1 Airpollution,smoking,anddietaredocumentedenviron- mental factors affecting health, yet these factors are but a fraction of the “exposome,” the totality of the exposure loadoccurringthroughoutaperson’slifetime.1 Investigat- ing one or a handful of exposures at a time has led to a highly fragmented literature of epidemiologic associa- tions. Much of that literature is not reproducible, and se- lectivereportingmaybeamajorreasonforthelackofre- producibility. A new model is required to discover environmental exposures associated with disease while mitigating possibilities of selective reporting. Toremedythelackofreproducibilityandconcernsof validity, multiple personal exposures can be assessed si- multaneously in terms of their association with a condi- tion or disease of interest; the strongest associations can then be tentatively validated in independent data sets (eg, as done in references 2 and 3).2,3 The main advan- tages of this process include the ability to search the list ofexposuresandadjustformultiplicitysystematicallyand reportalltheprobedassociationsinsteadofonlythemost significant results. The term “environment-wide associa- tion studies” (EWAS) has been used to describe this ap- the EWAS vantage point, intervening on β-carotene (Figure, D) seems a futile exercise given its complex rela- tionship with other nutrients and pollutants. Giventhiscomplexity,howcanstudiesofenvironmen- talriskmoveforward?First,EWASanalysesshouldbeap- pliedtomultipledatasets,andconsistencycanbeformally examinedforallassessedcorrelations.Second,thetempo- ral relationship between exposure and changes in health parametersmayofferhelpfulhintsaboutwhichofthesig- nalsaremorethansimplecorrelations.Third,standardized adjustedanalyses,inwhichadjustmentsareperformedsys- tematicallyandinthesamewayacrossmultipledatasets may also help. This is in stark contrast with the current model,wherebymostepidemiologicstudiesusesingledata setswithoutreplicationaswellasnon–time-dependentas- sessments,andreportedadjustmentsaremarkedlydiffer- entacrossreportsanddatasets,eventhoseperformedby thesameteam(differentapproachesincreasevaliditybut mustbereconciledandassimilated). However, eventually for most environmental cor- relates,theremaybeunsurpassabledifficultyestablish- ing potential causal inferences based on observationa data alone. Factors that seem protective may some- times be tested in randomized trials. The complexity of VIEWPOINT Chirag J. Patel, PhD Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts. John P. A. Ioannidis, MD, DSc Stanford Prevention Research Center, Department of Health Research and Policy, Department of Medicine, Stanford University School of Medicine, Stanford, California, Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, California, and Meta-Research Innovation Center at Stanford (METRICS), Stanford, California. Opinion High-throughputascertainmentofendogenousindicatorsofen- vironmentalexposurethatmayreflecttheexposomeincreasinglyat- tractattention,andtheirperformanceneedstobecarefullyevaluated. These include chemical detection of indicators of exposure through metabolomics, proteomics, and biosensors.7 Eventually, patterns of US federally funded gene expression experiment data be d itedinpublicrepositoriessuchastheGeneExpressionOmnibu repositoryhasbeeninstrumentalindevelopmentoftechnolo measurement of gene expression, data standardization, and ofdatafordiscovery.JustaswiththeGeneExpressionOmnib Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt Nutrition Examination Survey (NHANES) Participants, 2003-2004 A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Negative correlation Positive correl Infectious agents Pollutants Nutrients and vitamins Demographic attributes Eachcorrelationinterdependencyglobeincludes317environmentalexposures representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and thicknessofeachedgeindicatesthemagnitudeofthecorrelation. Opinion Viewpoint •bioinformatics to connect exposome with phenome •new ‘omics technologies to measure the exposome •dense correlations •reverse causality •confounding •(longitudinal) publicly available data
  • 32. BD2K Patient-Centered Information Commons Integrated repositories of individual-level information PI: Isaac Kohane http://pic-sure.org
  • 33. with Paul Avillach, Michael McDuffie, Jeremy Easton-Marks, Cartik Saravanamuthu and the BD2K PIC-SURE team NHANES 1999-2006 API available now http://bit.ly/nhanes_pici BD2K Patient-Centered Information Commons NHANES exposome browser
  • 34. THE PRECISION MEDICINE INITIATIVE WHAT IS IT? Precision medicine is an emerging approach for disease prevention and treatment that takes into account people’s individual variations in genes, environment, and lifestyle. The Precision Medicine Initiative will generate the scientific evidence needed to move the concept of precision medicine into clinical practice. WHY NOW? The time is right because of: Sequencing of the human genome Improved technologies for biomedical analysis New tools for using large datasets NEAR TERM GOALS Intensify efforts to apply precision medicine to cancer.http://www.nih.gov/precisionmedicine
  • 35. Committee on A Framework for Developing a New Taxonomy of Disease Board on Life Sciences Division on Earth and Life Studies NRC, National Academy of Sciences 2011 The use of multiple molecular parameters to characterize disease [P] may lead to a more accurate and find-grained classification of disease [P]… “multiple molecular parameters” must include E!
  • 36. P We are many phenotypes simultaneously: Can we better categorize these P? Body Measures Body Mass Index Height Blood pressure & fitness Systolic BP Diastolic BP Pulse rate VO2 Max Metabolic Glucose LDL-Cholesterol Triglycerides Inflammation C-reactive protein white blood cell count Kidney function Creatinine Sodium Uric Acid Liver function Aspartate aminotransferase Gamma glutamyltransferase Aging Telomere length
  • 37. Creation of a phenotype-exposure association map: A 2-D view of 83 phenotype by 252 exposure associations > 0 < 0 Association Size: Clusters of exposures associated with clusters of phenotypes? 252 biomarkers of exposure × 83 clinical trait phenotypes NHANES 1999-2000, 2001-2002, 2005-2006 ~21K regressions: replicated significant (FDR < 5%) in 2003-2004 adjusted by age, age2, sex, race, income, chronic disease Hugues Aschard, JP Ioannidis 83phenotypes 252 exposures
  • 38. Alpha-carotene Alcohol VitaminEasalpha-tocopherol Beta-carotene Caffeine Calcium Carbohydrate Cholesterol Copper Beta-cryptoxanthin Folicacid Folate,DFE Foodfolate Dietaryfiber Iron Energy Lycopene Lutein+zeaxanthin MFA16:1 MFA18:1 MFA20:1 Magnesium Totalmonounsaturatedfattyacids Moisture Niacin PFA18:2 PFA18:3 PFA20:4 PFA22:5 PFA22:6 Totalpolyunsaturatedfattyacids Phosphorus Potassium Protein Retinol SFA4:0 SFA6:0 SFA8:0 SFA10:0 SFA12:0 SFA14:0 SFA16:0 SFA18:0 Selenium Totalsaturatedfattyacids Totalsugars Totalfat Theobromine VitaminA,RAE Thiamin VitaminB12 Riboflavin VitaminB6 VitaminC VitaminK Zinc NoSalt OrdinarySalt a-Carotene VitaminB12,serum trans-b-carotene cis-b-carotene b-cryptoxanthin Folate,serum g-tocopherol Iron,FrozenSerum CombinedLutein/zeaxanthin trans-lycopene Folate,RBC Retinylpalmitate Retinylstearate Retinol VitaminD a-Tocopherol Daidzein o-Desmethylangolensin Equol Enterodiol Enterolactone Genistein EstimatedVO2max PhysicalActivity Doesanyonesmokeinhome? Total#ofcigarettessmokedinhome Cotinine CurrentCigaretteSmoker? Agelastsmokedcigarettesregularly #cigarettessmokedperdaywhenquit #cigarettessmokedperdaynow #dayssmokedcigsduringpast30days Avg#cigarettes/dayduringpast30days Smokedatleast100cigarettesinlife Doyounowsmokecigarettes... numberofdayssincequit Usedsnuffatleast20timesinlife drink5inaday drinkperday days5drinksinyear daysdrinkinyear 3-fluorene 2-fluorene 3-phenanthrene 1-phenanthrene 2-phenanthrene 1-pyrene 3-benzo[c]phenanthrene 3-benz[a]anthracene Mono-n-butylphthalate Mono-phthalate Mono-cyclohexylphthalate Mono-ethylphthalate Mono-phthalate Mono--hexylphthalate Mono-isobutylphthalate Mono-n-methylphthalate Mono-phthalate Mono-benzylphthalate Cadmium Lead Mercury,total Barium,urine Cadmium,urine Cobalt,urine Cesium,urine Mercury,urine Iodine,urine Molybdenum,urine Lead,urine Platinum,urine Antimony,urine Thallium,urine Tungsten,urine Uranium,urine BloodBenzene BloodEthylbenzene Bloodo-Xylene BloodStyrene BloodTrichloroethene BloodToluene Bloodm-/p-Xylene 1,2,3,7,8-pncdd 1,2,3,7,8,9-hxcdd 1,2,3,4,6,7,8-hpcdd 1,2,3,4,6,7,8,9-ocdd 2,3,7,8-tcdd Beta-hexachlorocyclohexane Gamma-hexachlorocyclohexane Hexachlorobenzene HeptachlorEpoxide Mirex Oxychlordane p,p-DDE Trans-nonachlor 2,5-dichlorophenolresult 2,4,6-trichlorophenolresult Pentachlorophenol Dimethylphosphate Diethylphosphate Dimethylthiophosphate PCB66 PCB74 PCB99 PCB105 PCB118 PCB138&158 PCB146 PCB153 PCB156 PCB157 PCB167 PCB170 PCB172 PCB177 PCB178 PCB180 PCB183 PCB187 3,3,4,4,5,5-hxcb 3,3,4,4,5-pncb 3,4,4,5-tcb Perfluoroheptanoicacid Perfluorohexanesulfonicacid Perfluorononanoicacid Perfluorooctanoicacid Perfluorooctanesulfonicacid Perfluorooctanesulfonamide 2,3,7,8-tcdf 1,2,3,7,8-pncdf 2,3,4,7,8-pncdf 1,2,3,4,7,8-hxcdf 1,2,3,6,7,8-hxcdf 1,2,3,7,8,9-hxcdf 2,3,4,6,7,8-hxcdf 1,2,3,4,6,7,8-hpcdf Measles Toxoplasma HepatitisAAntibody HepatitisBcoreantibody HepatitisBSurfaceAntibody HerpesII Albumin, urine Uric acid Phosphorus Osmolality Sodium Potassium Creatinine Chloride Total calcium Bicarbonate Blood urea nitrogen Total protein Total bilirubin Lactate dehydrogenase LDH Gamma glutamyl transferase Globulin Alanine aminotransferase ALT Aspartate aminotransferase AST Alkaline phosphotase Albumin Methylmalonic acid PSA. total Prostate specific antigen ratio TIBC, Frozen Serum Red cell distribution width Red blood cell count Platelet count SI Segmented neutrophils percent Mean platelet volume Mean cell volume Mean cell hemoglobin MCHC Hemoglobin Hematocrit Ferritin Protoporphyrin Transferrin saturation White blood cell count Monocyte percent Lymphocyte percent Eosinophils percent C-reactive protein Segmented neutrophils number Monocyte number Lymphocyte number Eosinophils number Basophils number mean systolic mean diastolic 60 sec. pulse: 60 sec HR Total Cholesterol Triglycerides Glucose, serum Insulin Homocysteine Glucose, plasma Glycohemoglobin C-peptide: SI LDL-cholesterol Direct HDL-Cholesterol Bone alkaline phosphotase Trunk Fat Lumber Pelvis BMD Lumber Spine BMD Head BMD Trunk Lean excl BMC Total Lean excl BMC Total Fat Total BMD Weight Waist Circumference Triceps Skinfold Thigh Circumference Subscapular Skinfold Recumbent Length Upper Leg Length Standing Height Head Circumference Maximal Calf Circumference Body Mass Index -0.4 -0.2 0 0.2 0.4 Value 050100150 Color Key and Histogram Count http://bit.ly.com/pemap phenotypes exposures +- nutrients BMI,weight, BMD metabolic renalfunction pcbs metabolic bloodparameters hydrocarbons Creation of a phenotype-exposure association map: A 2-D view of connections between P and E
  • 39. Body Mass Index Waist circumference Trunk fat Total fat Weight Total lean fat Thigh circumference Calf circumference Trunk Lean Skinfold CRP Trans-b-carotene a-carotene cis-b-carotene b-cryptoxanthin lutein/xeaxanthin VitaminD Magnesium Folate Vo2Max PCB180 Cotinine 100cigs Ciginlast30 Cadmium Benzene Toluene Smokeinhome? Styrene Currentsmoker 3-fluorene 2-fluorene White blood cell count Segmented neutrophils Monocyte number Lymphocyte number Eosinophils number Basophils number Alkaline phosphotase Homocysteine Hemoglobin Pulse rate http://bit.ly.com/pemap EWAS-derived phenotype-exposure association map: Zooming in to WBC and BMI phenotype clusters Alpha-carotene Alcohol VitaminEasalpha-tocopherol Beta-carotene Caffeine Calcium Carbohydrate Cholesterol Copper Beta-cryptoxanthin Folicacid Folate,DFE Foodfolate Dietaryfiber Iron Energy Lycopene Lutein+zeaxanthin MFA16:1 MFA18:1 MFA20:1 Magnesium Totalmonounsaturatedfattyacids Moisture Niacin PFA18:2 PFA18:3 PFA20:4 PFA22:5 PFA22:6 Totalpolyunsaturatedfattyacids Phosphorus Potassium Protein Retinol SFA4:0 SFA6:0 SFA8:0 SFA10:0 SFA12:0 SFA14:0 SFA16:0 SFA18:0 Selenium Totalsaturatedfattyacids Totalsugars Totalfat Theobromine VitaminA,RAE Thiamin VitaminB12 Riboflavin VitaminB6 VitaminC VitaminK Zinc NoSalt OrdinarySalt a-Carotene VitaminB12,serum trans-b-carotene cis-b-carotene b-cryptoxanthin Folate,serum g-tocopherol Iron,FrozenSerum CombinedLutein/zeaxanthin trans-lycopene Folate,RBC Retinylpalmitate Retinylstearate Retinol VitaminD a-Tocopherol Daidzein o-Desmethylangolensin Equol Enterodiol Enterolactone Genistein EstimatedVO2max PhysicalActivity Doesanyonesmokeinhome? Total#ofcigarettessmokedinhome Cotinine CurrentCigaretteSmoker? Agelastsmokedcigarettesregularly #cigarettessmokedperdaywhenquit #cigarettessmokedperdaynow #dayssmokedcigsduringpast30days Avg#cigarettes/dayduringpast30days Smokedatleast100cigarettesinlife Doyounowsmokecigarettes... numberofdayssincequit Usedsnuffatleast20timesinlife drink5inaday drinkperday days5drinksinyear daysdrinkinyear 3-fluorene 2-fluorene 3-phenanthrene 1-phenanthrene 2-phenanthrene 1-pyrene 3-benzo[c]phenanthrene 3-benz[a]anthracene Mono-n-butylphthalate Mono-phthalate Mono-cyclohexylphthalate Mono-ethylphthalate Mono-phthalate Mono--hexylphthalate Mono-isobutylphthalate Mono-n-methylphthalate Mono-phthalate Mono-benzylphthalate Cadmium Lead Mercury,total Barium,urine Cadmium,urine Cobalt,urine Cesium,urine Mercury,urine Iodine,urine Molybdenum,urine Lead,urine Platinum,urine Antimony,urine Thallium,urine Tungsten,urine Uranium,urine BloodBenzene BloodEthylbenzene Bloodo-Xylene BloodStyrene BloodTrichloroethene BloodToluene Bloodm-/p-Xylene 1,2,3,7,8-pncdd 1,2,3,7,8,9-hxcdd 1,2,3,4,6,7,8-hpcdd 1,2,3,4,6,7,8,9-ocdd 2,3,7,8-tcdd Beta-hexachlorocyclohexane Gamma-hexachlorocyclohexane Hexachlorobenzene HeptachlorEpoxide Mirex Oxychlordane p,p-DDE Trans-nonachlor 2,5-dichlorophenolresult 2,4,6-trichlorophenolresult Pentachlorophenol Dimethylphosphate Diethylphosphate Dimethylthiophosphate PCB66 PCB74 PCB99 PCB105 PCB118 PCB138&158 PCB146 PCB153 PCB156 PCB157 PCB167 PCB170 PCB172 PCB177 PCB178 PCB180 PCB183 PCB187 3,3,4,4,5,5-hxcb 3,3,4,4,5-pncb 3,4,4,5-tcb Perfluoroheptanoicacid Perfluorohexanesulfonicacid Perfluorononanoicacid Perfluorooctanoicacid Perfluorooctanesulfonicacid Perfluorooctanesulfonamide 2,3,7,8-tcdf 1,2,3,7,8-pncdf 2,3,4,7,8-pncdf 1,2,3,4,7,8-hxcdf 1,2,3,6,7,8-hxcdf 1,2,3,7,8,9-hxcdf 2,3,4,6,7,8-hxcdf 1,2,3,4,6,7,8-hpcdf Measles Toxoplasma HepatitisAAntibody HepatitisBcoreantibody HepatitisBSurfaceAntibody HerpesII Albumin, urine Uric acid Phosphorus Osmolality Sodium Potassium Creatinine Chloride Total calcium Bicarbonate Blood urea nitrogen Total protein Total bilirubin Lactate dehydrogenase LDH Gamma glutamyl transferase Globulin Alanine aminotransferase ALT Aspartate aminotransferase AST Alkaline phosphotase Albumin Methylmalonic acid PSA. total Prostate specific antigen ratio TIBC, Frozen Serum Red cell distribution width Red blood cell count Platelet count SI Segmented neutrophils percent Mean platelet volume Mean cell volume Mean cell hemoglobin MCHC Hemoglobin Hematocrit Ferritin Protoporphyrin Transferrin saturation White blood cell count Monocyte percent Lymphocyte percent Eosinophils percent C-reactive protein Segmented neutrophils number Monocyte number Lymphocyte number Eosinophils number Basophils number mean systolic mean diastolic 60 sec. pulse: 60 sec HR Total Cholesterol Triglycerides Glucose, serum Insulin Homocysteine Glucose, plasma Glycohemoglobin C-peptide: SI LDL-cholesterol Direct HDL-Cholesterol Bone alkaline phosphotase Trunk Fat Lumber Pelvis BMD Lumber Spine BMD Head BMD Trunk Lean excl BMC Total Lean excl BMC Total Fat Total BMD Weight Waist Circumference Triceps Skinfold Thigh Circumference Subscapular Skinfold Recumbent Length Upper Leg Length Standing Height Head Circumference Maximal Calf Circumference Body Mass Index -0.4 -0.2 0 0.2 0.4 Value 050100150 Color Key and Histogram Count +-
  • 40. Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum inflammation adiposity kidney function metabolic traits
  • 41. 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum “bad” cholesterol “good” cholesterol Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E
  • 42. 7 6 5 4 3 2 1 0 Distance liver:Albumin kidney:Bicarbonate immunological:Basophils percent immunological:Lymphocyte percent immunological:Eosinophils percent kidney:Phosphorus liver:Total protein liver:Aspartate aminotransferase AST liver:Alanine aminotransferase ALT body measures:Head Circumference body measures:Recumbent Length liver:Lactate dehydrogenase LDH cancer:Prostate specific antigen ratio cancer:PSA, free blood:Transferrin saturation liver:Total bilirubin heart:Direct HDL-Cholesterol immunological:Monocyte percent bone:Head BMD body measures:Standing Height body measures:Upper Leg Length bone:Total BMD bone:Lumber Spine BMD bone:Lumber Pelvis BMD heart:Triglycerides heart:LDL-cholesterol heart:Total Cholesterol blood:MCHC blood:TIBC, Frozen Serum blood:Hematocrit blood:Hemoglobin kidney:Potassium blood:Mean cell hemoglobin blood:Mean cell volume kidney:Uric acid kidney:Blood urea nitrogen kidney:Total calcium kidney:Creatinine blood:Ferritin blood:Red blood cell count body measures:Weight blood:Segmented neutrophils percent body measures:Total Lean excl BMC body measures:Trunk Lean excl BMC body measures:Body Mass Index body measures:Waist Circumference body measures:Triceps Skinfold body measures:Maximal Calf Circumference body measures:Thigh Circumference liver:Gamma glutamyl transferase blood pressure:60 sec. pulse: metabolic:Insulin body measures:Total Fat body measures:Trunk Fat body measures:Subscapular Skinfold blood pressure:mean systolic immunological:C-reactive protein liver:Globulin immunological:Monocyte number immunological:Segmented neutrophils number immunological:Lymphocyte number immunological:White blood cell count immunological:Basophils number immunological:Eosinophils number blood:Mean platelet volume heart:Homocysteine nutrition:Methylmalonic acid kidney:Osmolality kidney:Chloride kidney:Sodium kidney:Albumin, urine blood pressure:60 sec HR cancer:PSA. total blood:Platelet count SI blood:Protoporphyrin blood:Red cell distribution width bone:Bone alkaline phosphotase liver:Alkaline phosphotase blood pressure:mean diastolic metabolic:C-peptide: SI metabolic:Glycohemoglobin metabolic:Glucose, plasma metabolic:Glucose, serum height + BMD Toward a phenotype-exposure association map: (Re)-categorizing phenotypes with E
  • 43. Triglycerides Total Cholesterol LDL-cholesterol Trunk Fat Albumin, urine Insulin Total Fat Head Circumference Blood urea nitrogen Albumin Homocysteine C-peptide: SI C-reactive protein Body Mass Index Ferritin Thigh Circumference Maximal Calf Circumference Direct HDL-Cholesterol Total calcium Total bilirubin Red cell distribution width Gamma glutamyl transferase Mean cell volume Mean cell hemoglobin White blood cell count Uric acid Protoporphyrin Hemoglobin Total protein Alkaline phosphotase Waist Circumference Hematocrit Weight Standing Height 1/Creatinine Creatinine Trunk Lean excl BMC Methylmalonic acid Triceps Skinfold Lymphocyte number Subscapular Skinfold Total Lean excl BMC Segmented neutrophils number Lactate dehydrogenase LDH Bone alkaline phosphotase TIBC, Frozen Serum Aspartate aminotransferase AST Phosphorus Lumber Pelvis BMD Glycohemoglobin Globulin Chloride Bicarbonate Alanine aminotransferase ALT 60 sec. pulse: Upper Leg Length Total BMD Potassium Glucose, serum Glucose, plasma Red blood cell count Lumber Spine BMD Platelet count SI MCHC Osmolality Monocyte number mean systolic Lymphocyte percent Segmented neutrophils percent Recumbent Length Eosinophils number Monocyte percent Head BMD mean diastolic Prostate specific antigen ratio 60 sec HR Basophils number Sodium PSA, free Mean platelet volume Eosinophils percent PSA. total Basophils percent 0 10 20 30 40 R^2 * 100 1 to 66 exposures identified for 81 phenotypes Additive effect of E factors: Describe < 20% of variability in P (On average: 8%) σ2E? Recall: Avg(h2) = 50% Long road ahead to capture σ2 P
  • 44. Connecting Environmental Exposure with Disease: Missing the “System” of Exposures? E+ E- diseased non- diseased ? Exposed to many things, but do not assess the multiplicity. Fragmented literature of associations. Challenge to discover E associated with disease.
  • 45. Example of fragmentation: Is everything we eat associated with cancer? Schoenfeld and Ioannidis, AJCN 2012 50 random ingredients from Boston Cooking School Cookbook Any associated with cancer? FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie outliers are not shown (effect estimates .10). Of 50, 40 studied in cancer risk Weak statistical evidence: non-replicated inconsistent effects non-standardized
  • 46. e modelling oblem is akin to – but less well sed and more poorly understood than – e testing. For example, consider the use r regression to adjust the risk levels of atments to the same background level There can be many covariates, and t of covariates can be in or out of the With ten covariates, there are over 1000 models. Consider a maze as a metaphor elling (Figure 3). The red line traces the path out of the maze. The path through ze looks simple, once it is known. ways in the literature for dealing with model selection, so we propose a new, composite 2. Publication bias is general recognition that a paper much better chance of acceptance if hing new is found. This means that, for ation, the claim in the paper has to sed on a p-value less than 0.05. From g’s point of view5 , this is quality by tion. The journals are placing heavy ce on a statistical test rather than nation of the methods and steps that o a conclusion. As to having a p-value han 0.05, some might be tempted to the system10 through multiple testing, ple modelling or unfair treatment of or some combination of the three that to a small p-value. Researchers can be creative in devising a plausible story to statistical finding. 2 The data cleaning team creates a modelling data set and a holdout set and P < 0.05 Figure 3. The path through a complex process can appear quite simple once the path is defined. Which terms are included in a multiple linear regression model? Each turn in a maze is analogous to including or not a specific term in the evolving linear model. By keeping an eye on the p-value on the term selected to be at issue, one can work towards a suitably small p-value. © ktsdesign – Fotolia A maze of associations is one way to a fragmented literature and Vibration of Effects Young, 2011 univariate sex sex & age sex & race sex & race & age JCE, 2015
  • 47. Distribution of associations and p-values due to model choice: Estimating the Vibration of Effects (or Risk) Variable of Interest e.g., 1 SD of log(serum Vitamin D) Adjusting Variable Set n=13 All-subsets Cox regression 213+ 1 = 8,193 models SES [3rd tertile] education [>HS] race [white] body mass index [normal] total cholesterol any heart disease family heart disease any hypertension any diabetes any cancer current/past smoker [no smoking] drink 5/day physical activity Data Source NHANES 1999-2004 417 variables of interest time to death N≧1000 (≧100 deaths) effect sizes p-values ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 1 50 1 50 99 5.0 7.5 −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RPvalue = 4.68 A B C D E median p-value/HR for k percentile indicator JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90
  • 48. The Vibration of Effects: Vitamin D and Thyroxine and attenuated risk in mortality JCE, 2015 ● ● ● ● ● ● ● ● ● ● ● ● ●● 0 1 2 3 4 5 6 7 8 9 10 11 1213 1 50 99 1 50 99 2.5 5.0 7.5 0.64 0.68 0.72 0.76 Hazard Ratio −log10(pvalue) Vitamin D (1SD(log)) RHR = 1.14 RP = 4.68 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 50 99 1 50 99 1 2 3 4 0.75 0.80 0.85 0.90 Hazard Ratio −log10(pvalue) Thyroxine (1SD(log)) RHR = 1.15 RP = 2.90
  • 49. JCE, 2015 Janus (two-faced) risk profile Risk and significance depends on modeling scenario! The Vibration of Effects: beware of the Janus effect (both risk and protection?!) “risk”“protection” “significant” Brittanica.com
  • 51. Emerging technologies to ascertain exposome will enable biomedical discovery High-throughput E data standards & exposome: mitigate fragmented literature of associations Confounding, reverse causality: how to handle at large dimension? e.g., EWASs in telomere length and mortality and 81 quantitative phenotypes Prioritize biological and epidemiological studies.
  • 52. New ways of measuring P are here now! Can we use them to assess E (and G)?
  • 53. physical activity monitors (fitbit) smart devices (iOS) personal E sensors (exposome band?!) propeller health
  • 54. Now possible to consent thousands of people at the push of a button! http://researchkit.org
  • 55. Possible to survey P of diabetics consented through ResearchKit? Adam Brown Stanley Shaw (MGH) Dennis Ausiello (MGH) http://bit.ly/glucosuccess
  • 56. http://bit.ly/glucosuccess Demographics age, sex, etc Diabetes Indicators Hemoglobin A1C glucose (fasting, bedtime) Passive Activity Motion Step count N ~4000 diabetics 186K manual glucose entries 7.6M passive step count entries
  • 57. Age (years): 43.6 Male %: 80% Female %: 20% Race (%): White: 57% Black: 7% Hispanic: 11% Other: 25% Education (%): Some High School: 2% High School: 8% Some college: 20% 2-year college: 10% 4 year college: 26% Post-college: 32% http://bit.ly/glucosuccess Mean Years Diabetic: 7.8 GlucoSuccess has captured a unique population quickly (< 1 year of surveillance) Comorbidities (CDC*) Stroke: 2% (0.7%) Heart Failure: 2% (1%) High Blood Pressure: 47% (57%) High Lipids: 36% (58%) Kidney Disease: 4% (0.2%*) Circulation problems: 8% (4%) Eye problems: 9% (17%*) *end-stage renal disease *visual impairment http://www.cdc.gov/diabetes Body Mass Index: 31 Hemoglobin A1C: 7.7
  • 58. Is step count on previous day associated with fasting glucose the next day?: mashing up 24K step counts with glucose (N=600) 10000 steps ~ 1.5 mg/dL (random-effects linear model) p<1x10-16 glucosedayN(mg/dL) Steps (in 1000s), day N-1
  • 59. http://bit.ly/glucosuccess GlucoSuccess-like apps can enable longitudinal and dynamic surveillance of P However: population-level differences and generalizability
  • 60. Possible to (re-)use high-throughput data (exposome, medical claims, devices) to discover the role of E (and G) in P. −log10(pvalue) ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● acrylamide allergentest bacterialinfection cotinine diakyl dioxins furansdibenzofuran heavymetals hydrocarbons latex nutrientscarotenoid nutrientsminerals nutrientsvitaminA nutrientsvitaminB nutrientsvitaminC nutrientsvitaminD nutrientsvitaminE pcbs perchlorate pesticidesatrazine pesticideschlorophenol pesticidesorganochlorine pesticidesorganophosphate pesticidespyrethyroid phenols phthalates phytoestrogens polybrominatedethers polyflourochemicals viralinfection volatilecompounds 012 A Serum cotinine B Serum total mercury 37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations Infectious agents Pollutants Nutrients and vitamins Demographic attributes P = G + E
  • 61. Harvard DBMI Isaac Kohane Susanne Churchill Stan Shaw Nathan Palmer Jenn Grandfield Sunny Alvear Michal Preminger Harvard Chan Hugues Aschard Francesca Dominici Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org NIH Common Fund Big Data to Knowledge Acknowledgements CDC Marta Gwinn Ridgely Green Muin Khoury Denise Lowe Stanford John Ioannidis Atul Butte (UCSF) U Queensland Jian Yang Peter Visscher Cochrane Belinda Burford RagGroup Chirag Lakhani Adam Brown Danielle Rasooly Arjun Manrai Erik Corona Nam Pho