SlideShare a Scribd company logo
1 of 27
Download to read offline
Data analytics approaches to enable EWAS
Chirag J Patel and Nam Pho

Emory Exposome Workshop

06/16/16
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
Extensible & open-source analytics software library
Freely available exposome data for your research
(XWAS R package)
(NHANES: 40,000 individuals and 1,000 variables)
Materials for teaching and demonstration
Computer “environment” to conduct EWASs
(Docker container in RStudio)
goodness!XWAS + + =
http://bit.ly/exposome-analytics-course
Please let us know if you are using the resources

(or provide feedback)!
@chiragjp @nampho2
Chirag Nam
diseases

diabetes

cancer

heart disease
function

expression

telomeres

metabolome
external

geography

air pollution

income
internal

lead (serum)

nutrients (serum)

infection (urine)

metabolome
phenomeexposome
Real quick:
What is the exposome? What is the phenome?
phenome
Exposome associated with the phenome?

exposome
Analytic tools and big data infrastructure required to
associate exposome with phenome!
…and vice versa?
We can learn a thing or two from genomics
investigation…
phenomegenome
e.g., GWAS
Computational approaches fueled discovery 

of genetic variants in disease

(example: genome-wide association [GWAS])
A search engine for robust, reproducible genotype-
phenotype associations…
A RT I C L E S
13 autosomal loci exceeded the threshold for genome-wide significance (r2 < 0.05), and conditional analyses (see below) establish these SNPs
50 Locus established previously
Locus identified by current study
Locus not confirmed by current study
BCL11A
THADA
NOTCH2
ADAMTS9
IRS1
IGF2BP2
WFS1
ZBED3
CDKAL1
HHEX/IDE
KCNQ1 (2 signals*: )
TCF7L2
KCNJ11
CENTD2
MTNR1B
HMGA2 ZFAND6
PRC1
FTO
HNF1B DUSP9
Conditional analysis
Unconditional analysis
TSPAN8/LGR5
HNF1A
CDC123/CAMK1D
CHCHD9
CDKN2A/2B
SLC30A8
TP53INP1
JAZF1
KLF14
PPAR
40
30
–log10(P)–log10(P)
20
10
10
1 2 3 4 5 6 7 8
Chromosome
9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
0
0
Suggestive statistical association (P < 1 10
–5
)
Association in identified or established region (P < 1 10
–4
)
Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta-
analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those
taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and
should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously
established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered
conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas
secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4).
Voight et al, Nature Genetics 2012

N=8K T2D, 39K Controls
GWAS in Type 2 Diabetes
There are non-trivial data analytic challenges in
searching for exposome-phenome associations!
JAMA 2014

Pac Symp Biocomp 2015
Dense correlational web!
JAMA 2014

Pac Symp Biocomp 2015
what causes what?
confounding bias?
Carotenoids
Minerals A
B
C
D
E
Phytoestrogens
Cotinine
Hydrocarbons
Volatile compounds
Virus
Bacteria
Phenols
Phthalates
Polyfluorochemicals
Heavy metals
PCBs
Dioxins
Vitamins
Diakyl
Furans
Pesticides
Chloroacetanilide
Organochlorine
Organophosphate
Phenol
Pyrethyroid
Carotenoids
MineralsA
B
C
D
E
Phytoestrogens
Cotinine
Hydrocarbons
Volatilecompounds
Virus
Bacteria
Phenols
Phthalates
Polyfluorochemicals
Heavymetals
PCBs
Dioxins
Vitamins
Diakyl
Furans
Pesticides
Chloroacetanilide
Organochlorine
Organophosphate
Phenol
Pyrethyroid
0 0.5 1
Value
0200040006000
Colour key
and histogram
Count
Figure 2 Partial Pearson’s correlation between environmental factors. Partial Pearson’s correlation, adjusted by age and
BMI (and creatinine for factors measured in urine) for each of the 188 factors were computed for each survey separately.
We combined correlations between surveys using a meta-analytic random-effects estimate and displayed them in a heatmap
(above), and ordered them by environmental ‘class’, coloured as in Figure 1A. Pairs of factors where correlations could not
be computed are shown in grey IJE 2013
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
Pac Symp Biocomput. 2015

JECH. 2015http://bit.ly/globebrowse
Multiplicity: how to determine signal from noise?

type 1 error (spurious findings)
JAMA 2014
Suppose you are testing 1000 exposures in case-control study
(disease vs. healthy)…
… and there were no difference between the cases and controls…
…how many findings would be “significant” at a p-value threshold of
0.05 (due to chance)?
Regime of multiple tests and “signal to noise”:

Histogram of p-values in 2 scenarios: no difference and 5% different
pvalues
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
020406080100
No difference
(no true associations)
pvalues.ewas
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
020406080100140
5% exposures different
(5% true associations)
A B
Estimating the deviation from null:

QQplot: -log10(pvalues) in the null and EWAS distributions
0.0 0.5 1.0 1.5 2.0 2.5
010203040
expected −log10(pvalue)
10%different−log10(pvalue)
A
B
Bonferroni
False discovery rate
The tension between type 1 and type 2 errors:

Power and replication for robust associations!
Discovery sample sizes must be large to overcome
multiple testing and mitigate winner’s curse

Replication sample size must be large to detect
association
Discovery

N=1K
Replication

N=1K
What will the exposome data structure look like?:

a high-dimensioned 3D matrix of 

(1) exposure measurements 

on (2) individuals as a function of (3) time
tim
e
exposome
pollutants
diet
m
etabolites . . .
gut flora
CVD
xenobiotics
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
drugs
integrative
(A) (C)
(B)
exposome
factors
nutrient value for
individual i
individual i
in review
tim
e
exposome
pollutants
diet
m
etabolites . . .
gut flora
CVD BP
can
xenobiotics
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
drugs
integrative
(A) (C)
(B)
longitudinal
system
genome
in review
What will the exposome data structure look like?:

a high-dimensioned 3D matrix of 

(1) exposure measurements 

on (2) individuals as a function of (3) time
A schematic of a data-driven search for exposome-phenome
associations…
tim
e
exposome phenome
pollutants
diet
m
etabolites . . .
gut flora
height
w
eight
CVD BP
T2D
cancer
xenobiotics . . .
individuals
GWAS, RVAS,
pathway
analysis..etc.
EWAS,
PheWAS..etc.
genome(static)
mixtures of
exposures
tim
e
drugs
integrative
mixtures of
phenotypes
(A) (C)
(B)
EWAS
in review
Time for you to give it a try!
Extensible & open-source analytics software library
Freely available exposome data for your research
(XWAS R Package)
(NHANES: 40,000 individuals and 1,000 variables)
Materials for teaching and demonstration
Computer “environment” to conduct EWASs
(Docker container in RStudio)
goodness!XWAS + + =
Fully merged dataset:

National Health and Nutrition Examination Survey
since the 1960s

now biannual: 1999 onwards

10,000 participants per survey

The sample for the survey is selected to represent
the U.S. population of all ages. To produce reli-
able statistics, NHANES over-samples persons 60
and older, African Americans, and Hispanics.
Since the United States has experienced dramatic
growth in the number of older people during this
century, the aging population has major impli-
cations for health care needs, public policy, and
research priorities. NCHS is working with public
health agencies to increase the knowledge of the
health status of older Americans. NHANES has a
primary role in this endeavor.
All participants visit the physician. Dietary inter-
views and body measurements are included for
everyone. All but the very young have a blood
sample taken and will have a dental screening.
Depending upon the age of the participant, the
rest of the examination includes tests and proce-
dures to assess the various aspects of health listed
above. In general, the older the individual, the
more extensive the examination.
Survey Operations
Health interviews are conducted in respondents’
homes. Health measurements are performed in
specially-designed and equipped mobile centers,
which travel to locations throughout the country.
The study team consists of a physician, medical
and health technicians, as well as dietary and health
interviewers. Many of the study staff are
bilingual (English/Spanish).
An advanced computer system using high-
end servers, desktop PCs, and wide-area
networking collect and process all of the
NHANES data, nearly eliminating the need
for paper forms and manual coding operations.
This system allows interviewers to use note-
book computers with electronic pens. The staff
at the mobile center can automatically transmit
data into data bases through such devices as
digital scales and stadiometers. Touch-sensi-
tive computer screens let respondents enter
their own responses to certain sensitive ques-
tions in complete privacy. Survey information
is available to NCHS staff within 24 hours of
collection, which enhances the capability of
collecting quality data and increases the speed
with which results are released to the public.
In each location, local health and government
officials are notified of the upcoming survey.
Households in the study area receive a letter
from the NCHS Director to introduce the
survey. Local media may feature stories about
the survey.
NHANES is designed to facilitate and en-
courage participation. Transportation is provided
to and from the mobile center if necessary.
Participants receive compensation and a report
of medical findings is given to each participant.
All information collected in the survey is kept
strictly confidential. Privacy is protected by
public laws.
Uses of the Data
Information from NHANES is made available
through an extensive series of publications and
articles in scientific and technical journals. For
data users and researchers throughout the world,
survey data are available on the internet and on
easy-to-use CD-ROMs.
Research organizations, universities, health
care providers, and educators benefit from
survey information. Primary data users are
federal agencies that collaborated in the de-
sign and development of the survey. The
National Institutes of Health, the Food and
Drug Administration, and CDC are among the
agencies that rely upon NHANES to provide
data essential for the implementation and
evaluation of program activities. The U.S.
Department of Agriculture and NCHS coop-
erate in planning and reporting dietary and
nutrition information from the survey.
NHANES’ partnership with the U.S. Environ-
mental Protection Agency allows continued
study of the many important environmental
influences on our health.
• Physical fitness and physical functioning
• Reproductive history and sexual behavior
• Respiratory disease (asthma, chronic bron-
chitis, emphysema)
• Sexually transmitted diseases
• Vision
>250 exposures (serum + urine)

>85 quantitative clinical traits
(e.g., serum glucose, lipids, body
mass index)

Death index linkage (cause of
death)

Ready to analyze! N=41K with >1000 variables
(let us know; we can give you a DOI)
in review
http://bit.ly/ewas_nhanes
preterm birth
type 2 diabetes
type 2 diabetes genetics
lipids
blood pressure
income
mortality
telomere length
methodology (5)
13 EWAS-related manuscripts
Associations in Telomere Length:
Can you identify the associations in this graph?
IJE, 2016
0
1
2
3
4
−0.2 −0.1 0.0 0.1 0.2
effect size
−log10(pvalue)
FDR<5%
shorter telomeres longer telomeres
median N=3000; N range: 300-7000
??
Nam will show you how!
Associations in Telomere Length:
Can you identify the associations in this graph?
http://bit.ly/exposome-analytics-course
Please let us know if you are using the resources

(or provide feedback)!
@chiragjp @nampho2
Chirag Nam
Harvard DBMI
Isaac Kohane

Susanne Churchill

Stan Shaw

Jenn Grandfield

Michal Preminger
Chirag J Patel

chirag@hms.harvard.edu

@chiragjp

www.chiragjpgroup.org
NIH Common Fund

Big Data to Knowledge
Acknowledgements
RagGroup
Nam Pho
Chirag Lakhani
Adam Brown
Danielle Rasooly
Arjun Manrai
Grace Mahoney

Matthew Roy
Gary Miller
Kristine Dennis
Stanford
John PA Ioannidis

More Related Content

What's hot

Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryChirag Patel
 
AACR 041616 digital exposomes
AACR 041616 digital exposomesAACR 041616 digital exposomes
AACR 041616 digital exposomesChirag Patel
 
Studying the elusive in larger scale
Studying the elusive in larger scaleStudying the elusive in larger scale
Studying the elusive in larger scaleChirag Patel
 
EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119Chirag Patel
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundityChirag Patel
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019Chirag Patel
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease Chirag Patel
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopChirag Patel
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag Patel
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...ExternalEvents
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EChirag Patel
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryJean-Claude Bradley
 
Soergel oa week-2014-lightning
Soergel oa week-2014-lightningSoergel oa week-2014-lightning
Soergel oa week-2014-lightningDavid Soergel
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation Sean Ekins
 
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...Sean Ekins
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesYoann Pageaud
 
Tulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationTulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationElia Brodsky
 

What's hot (20)

Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discovery
 
AACR 041616 digital exposomes
AACR 041616 digital exposomesAACR 041616 digital exposomes
AACR 041616 digital exposomes
 
Studying the elusive in larger scale
Studying the elusive in larger scaleStudying the elusive in larger scale
Studying the elusive in larger scale
 
EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119EWAS and the exposome: Mt Sinai in Brescia 052119
EWAS and the exposome: Mt Sinai in Brescia 052119
 
Searching for predictors of male fecundity
Searching for predictors of male fecunditySearching for predictors of male fecundity
Searching for predictors of male fecundity
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019
 
Building a search engine for exposures in disease
Building a search engine for exposures in disease Building a search engine for exposures in disease
Building a search engine for exposures in disease
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data Workshop
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Japanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven EJapanese Environmental Children's Study and Data-driven E
Japanese Environmental Children's Study and Data-driven E
 
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug DiscoveryMel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
Mel Reichman on Pool Shark’s Cues for More Efficient Drug Discovery
 
Soergel oa week-2014-lightning
Soergel oa week-2014-lightningSoergel oa week-2014-lightning
Soergel oa week-2014-lightning
 
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of N...
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Open zika presentation
Open zika presentation Open zika presentation
Open zika presentation
 
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...Looking Back at Mycobacterium tuberculosis Mouse  Efficacy Testing To Move Ne...
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databases
 
Tulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationTulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integration
 

Similar to Data analytics to support exposome research course slides

How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...Servio Fernando Lima Reina
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014LushPrize
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judesSean Ekins
 
Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Statistics on big biomedical data - Methods and pitfalls when analyzing high...Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Statistics on big biomedical data - Methods and pitfalls when analyzing high...Lars Juhl Jensen
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Fundación Ramón Areces
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...Lars Juhl Jensen
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Lars Juhl Jensen
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management inscit2006
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Sean Ekins
 
Casey Greene's Keynote for Rocky 2015
Casey Greene's Keynote for Rocky 2015Casey Greene's Keynote for Rocky 2015
Casey Greene's Keynote for Rocky 2015Casey Greene
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingJoaquin Dopazo
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Fondazione Giannino Bassetti
 
The state of the art in behavioral machine learning for healthcare
The state of the art in behavioral machine learning for healthcareThe state of the art in behavioral machine learning for healthcare
The state of the art in behavioral machine learning for healthcareAfrica Perianez
 

Similar to Data analytics to support exposome research course slides (20)

How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
EUSFLAT 2019: explainable neuro fuzzy recurrent neural network to predict col...
 
Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014Grafström - Lush Prize Conference 2014
Grafström - Lush Prize Conference 2014
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
 
Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019Arjun Manrai - National Academies Talk - June 6, 2019
Arjun Manrai - National Academies Talk - June 6, 2019
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Statistics on big biomedical data - Methods and pitfalls when analyzing high...Statistics on big biomedical data - Methods and pitfalls when analyzing high...
Statistics on big biomedical data - Methods and pitfalls when analyzing high...
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
Bertrand de Meulder-El impacto de las ciencias ómicas en la medicina, la nutr...
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R...
 
Casey Greene's Keynote for Rocky 2015
Casey Greene's Keynote for Rocky 2015Casey Greene's Keynote for Rocky 2015
Casey Greene's Keynote for Rocky 2015
 
From reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene findingFrom reads to pathways for efficient disease gene finding
From reads to pathways for efficient disease gene finding
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...
 
The state of the art in behavioral machine learning for healthcare
The state of the art in behavioral machine learning for healthcareThe state of the art in behavioral machine learning for healthcare
The state of the art in behavioral machine learning for healthcare
 
2016 AGT Poster
2016 AGT Poster2016 AGT Poster
2016 AGT Poster
 

Recently uploaded

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...narwatsonia7
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadO963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadGenuine Call Girls
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...chandars293
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...narwatsonia7
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...parulsinha
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...parulsinha
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Ishani Gupta
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...perfect solution
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Genuine Call Girls
 

Recently uploaded (20)

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Jabalpur Just Call 8250077686 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟  9332606886 ⟟ Call Me For G...
Top Rated Bangalore Call Girls Ramamurthy Nagar ⟟ 9332606886 ⟟ Call Me For G...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadO963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
O963O942363 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 

Data analytics to support exposome research course slides

  • 1. Data analytics approaches to enable EWAS Chirag J Patel and Nam Pho Emory Exposome Workshop 06/16/16 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  • 2. Extensible & open-source analytics software library Freely available exposome data for your research (XWAS R package) (NHANES: 40,000 individuals and 1,000 variables) Materials for teaching and demonstration Computer “environment” to conduct EWASs (Docker container in RStudio) goodness!XWAS + + =
  • 3. http://bit.ly/exposome-analytics-course Please let us know if you are using the resources (or provide feedback)! @chiragjp @nampho2 Chirag Nam
  • 4. diseases diabetes cancer heart disease function expression telomeres metabolome external geography air pollution income internal lead (serum) nutrients (serum) infection (urine) metabolome phenomeexposome Real quick: What is the exposome? What is the phenome?
  • 5. phenome Exposome associated with the phenome? exposome Analytic tools and big data infrastructure required to associate exposome with phenome! …and vice versa?
  • 6. We can learn a thing or two from genomics investigation… phenomegenome e.g., GWAS
  • 7. Computational approaches fueled discovery of genetic variants in disease (example: genome-wide association [GWAS]) A search engine for robust, reproducible genotype- phenotype associations… A RT I C L E S 13 autosomal loci exceeded the threshold for genome-wide significance (r2 < 0.05), and conditional analyses (see below) establish these SNPs 50 Locus established previously Locus identified by current study Locus not confirmed by current study BCL11A THADA NOTCH2 ADAMTS9 IRS1 IGF2BP2 WFS1 ZBED3 CDKAL1 HHEX/IDE KCNQ1 (2 signals*: ) TCF7L2 KCNJ11 CENTD2 MTNR1B HMGA2 ZFAND6 PRC1 FTO HNF1B DUSP9 Conditional analysis Unconditional analysis TSPAN8/LGR5 HNF1A CDC123/CAMK1D CHCHD9 CDKN2A/2B SLC30A8 TP53INP1 JAZF1 KLF14 PPAR 40 30 –log10(P)–log10(P) 20 10 10 1 2 3 4 5 6 7 8 Chromosome 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 0 0 Suggestive statistical association (P < 1 10 –5 ) Association in identified or established region (P < 1 10 –4 ) Figure 1 Genome-wide Manhattan plots for the DIAGRAM+ stage 1 meta-analysis. Top panel summarizes the results of the unconditional meta- analysis. Previously established loci are denoted in red and loci identified by the current study are denoted in green. The ten signals in blue are those taken forward but not confirmed in stage 2 analyses. The genes used to name signals have been chosen on the basis of proximity to the index SNP and should not be presumed to indicate causality. The lower panel summarizes the results of equivalent meta-analysis after conditioning on 30 previously established and newly identified autosomal T2D-associated SNPs (denoted by the dotted lines below these loci in the upper panel). Newly discovered conditional signals (outside established loci) are denoted with an orange dot if they show suggestive levels of significance (P < 10−5), whereas secondary signals close to already confirmed T2D loci are shown in purple (P < 10−4). Voight et al, Nature Genetics 2012 N=8K T2D, 39K Controls GWAS in Type 2 Diabetes
  • 8. There are non-trivial data analytic challenges in searching for exposome-phenome associations! JAMA 2014 Pac Symp Biocomp 2015
  • 9. Dense correlational web! JAMA 2014 Pac Symp Biocomp 2015 what causes what? confounding bias?
  • 10. Carotenoids Minerals A B C D E Phytoestrogens Cotinine Hydrocarbons Volatile compounds Virus Bacteria Phenols Phthalates Polyfluorochemicals Heavy metals PCBs Dioxins Vitamins Diakyl Furans Pesticides Chloroacetanilide Organochlorine Organophosphate Phenol Pyrethyroid Carotenoids MineralsA B C D E Phytoestrogens Cotinine Hydrocarbons Volatilecompounds Virus Bacteria Phenols Phthalates Polyfluorochemicals Heavymetals PCBs Dioxins Vitamins Diakyl Furans Pesticides Chloroacetanilide Organochlorine Organophosphate Phenol Pyrethyroid 0 0.5 1 Value 0200040006000 Colour key and histogram Count Figure 2 Partial Pearson’s correlation between environmental factors. Partial Pearson’s correlation, adjusted by age and BMI (and creatinine for factors measured in urine) for each of the 188 factors were computed for each survey separately. We combined correlations between surveys using a meta-analytic random-effects estimate and displayed them in a heatmap (above), and ordered them by environmental ‘class’, coloured as in Figure 1A. Pairs of factors where correlations could not be computed are shown in grey IJE 2013
  • 11. Interdependencies of the exposome: Correlation globes paint a complex view of exposure Pac Symp Biocomput. 2015 JECH. 2015http://bit.ly/globebrowse
  • 12. Multiplicity: how to determine signal from noise? type 1 error (spurious findings) JAMA 2014
  • 13. Suppose you are testing 1000 exposures in case-control study (disease vs. healthy)… … and there were no difference between the cases and controls… …how many findings would be “significant” at a p-value threshold of 0.05 (due to chance)?
  • 14. Regime of multiple tests and “signal to noise”: Histogram of p-values in 2 scenarios: no difference and 5% different pvalues Frequency 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 No difference (no true associations) pvalues.ewas Frequency 0.0 0.2 0.4 0.6 0.8 1.0 020406080100140 5% exposures different (5% true associations) A B
  • 15. Estimating the deviation from null: QQplot: -log10(pvalues) in the null and EWAS distributions 0.0 0.5 1.0 1.5 2.0 2.5 010203040 expected −log10(pvalue) 10%different−log10(pvalue) A B Bonferroni False discovery rate
  • 16. The tension between type 1 and type 2 errors: Power and replication for robust associations! Discovery sample sizes must be large to overcome multiple testing and mitigate winner’s curse Replication sample size must be large to detect association Discovery N=1K Replication N=1K
  • 17. What will the exposome data structure look like?: a high-dimensioned 3D matrix of (1) exposure measurements on (2) individuals as a function of (3) time tim e exposome pollutants diet m etabolites . . . gut flora CVD xenobiotics individuals GWAS, RVAS, pathway analysis..etc. EWAS, PheWAS..etc. genome(static) mixtures of exposures drugs integrative (A) (C) (B) exposome factors nutrient value for individual i individual i in review
  • 18. tim e exposome pollutants diet m etabolites . . . gut flora CVD BP can xenobiotics individuals GWAS, RVAS, pathway analysis..etc. EWAS, PheWAS..etc. genome(static) mixtures of exposures drugs integrative (A) (C) (B) longitudinal system genome in review What will the exposome data structure look like?: a high-dimensioned 3D matrix of (1) exposure measurements on (2) individuals as a function of (3) time
  • 19. A schematic of a data-driven search for exposome-phenome associations… tim e exposome phenome pollutants diet m etabolites . . . gut flora height w eight CVD BP T2D cancer xenobiotics . . . individuals GWAS, RVAS, pathway analysis..etc. EWAS, PheWAS..etc. genome(static) mixtures of exposures tim e drugs integrative mixtures of phenotypes (A) (C) (B) EWAS in review
  • 20. Time for you to give it a try!
  • 21. Extensible & open-source analytics software library Freely available exposome data for your research (XWAS R Package) (NHANES: 40,000 individuals and 1,000 variables) Materials for teaching and demonstration Computer “environment” to conduct EWASs (Docker container in RStudio) goodness!XWAS + + =
  • 22. Fully merged dataset: National Health and Nutrition Examination Survey since the 1960s now biannual: 1999 onwards 10,000 participants per survey The sample for the survey is selected to represent the U.S. population of all ages. To produce reli- able statistics, NHANES over-samples persons 60 and older, African Americans, and Hispanics. Since the United States has experienced dramatic growth in the number of older people during this century, the aging population has major impli- cations for health care needs, public policy, and research priorities. NCHS is working with public health agencies to increase the knowledge of the health status of older Americans. NHANES has a primary role in this endeavor. All participants visit the physician. Dietary inter- views and body measurements are included for everyone. All but the very young have a blood sample taken and will have a dental screening. Depending upon the age of the participant, the rest of the examination includes tests and proce- dures to assess the various aspects of health listed above. In general, the older the individual, the more extensive the examination. Survey Operations Health interviews are conducted in respondents’ homes. Health measurements are performed in specially-designed and equipped mobile centers, which travel to locations throughout the country. The study team consists of a physician, medical and health technicians, as well as dietary and health interviewers. Many of the study staff are bilingual (English/Spanish). An advanced computer system using high- end servers, desktop PCs, and wide-area networking collect and process all of the NHANES data, nearly eliminating the need for paper forms and manual coding operations. This system allows interviewers to use note- book computers with electronic pens. The staff at the mobile center can automatically transmit data into data bases through such devices as digital scales and stadiometers. Touch-sensi- tive computer screens let respondents enter their own responses to certain sensitive ques- tions in complete privacy. Survey information is available to NCHS staff within 24 hours of collection, which enhances the capability of collecting quality data and increases the speed with which results are released to the public. In each location, local health and government officials are notified of the upcoming survey. Households in the study area receive a letter from the NCHS Director to introduce the survey. Local media may feature stories about the survey. NHANES is designed to facilitate and en- courage participation. Transportation is provided to and from the mobile center if necessary. Participants receive compensation and a report of medical findings is given to each participant. All information collected in the survey is kept strictly confidential. Privacy is protected by public laws. Uses of the Data Information from NHANES is made available through an extensive series of publications and articles in scientific and technical journals. For data users and researchers throughout the world, survey data are available on the internet and on easy-to-use CD-ROMs. Research organizations, universities, health care providers, and educators benefit from survey information. Primary data users are federal agencies that collaborated in the de- sign and development of the survey. The National Institutes of Health, the Food and Drug Administration, and CDC are among the agencies that rely upon NHANES to provide data essential for the implementation and evaluation of program activities. The U.S. Department of Agriculture and NCHS coop- erate in planning and reporting dietary and nutrition information from the survey. NHANES’ partnership with the U.S. Environ- mental Protection Agency allows continued study of the many important environmental influences on our health. • Physical fitness and physical functioning • Reproductive history and sexual behavior • Respiratory disease (asthma, chronic bron- chitis, emphysema) • Sexually transmitted diseases • Vision >250 exposures (serum + urine) >85 quantitative clinical traits (e.g., serum glucose, lipids, body mass index) Death index linkage (cause of death) Ready to analyze! N=41K with >1000 variables (let us know; we can give you a DOI) in review
  • 23. http://bit.ly/ewas_nhanes preterm birth type 2 diabetes type 2 diabetes genetics lipids blood pressure income mortality telomere length methodology (5) 13 EWAS-related manuscripts
  • 24. Associations in Telomere Length: Can you identify the associations in this graph? IJE, 2016 0 1 2 3 4 −0.2 −0.1 0.0 0.1 0.2 effect size −log10(pvalue) FDR<5% shorter telomeres longer telomeres median N=3000; N range: 300-7000 ??
  • 25. Nam will show you how! Associations in Telomere Length: Can you identify the associations in this graph?
  • 26. http://bit.ly/exposome-analytics-course Please let us know if you are using the resources (or provide feedback)! @chiragjp @nampho2 Chirag Nam
  • 27. Harvard DBMI Isaac Kohane Susanne Churchill Stan Shaw Jenn Grandfield Michal Preminger Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org NIH Common Fund Big Data to Knowledge Acknowledgements RagGroup Nam Pho Chirag Lakhani Adam Brown Danielle Rasooly Arjun Manrai Grace Mahoney Matthew Roy Gary Miller Kristine Dennis Stanford John PA Ioannidis