Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Partitioning heritability by functional
annotation using summary statistics
Hilary Finucane
MIT Department of Mathematics
HSPH Department of Epidemiology
October 21, 2014

Acknowledgements
• Brendan Bulik-
Sullivan
• Alkes Price
• Ben Neale
• Alexander Gusev
• Nick Patterson
• Po-Ru Loh
• Gosia Trynka
• Han Xu
• Verneri Anttila
• Yakir Reshef
• Chongzhi Zang
• Stephan Ripke
• Schizophrenia Working
Group of the PGC
• Shaun Purcell
• Mark Daly
• Eli Stahl
• Soumya Raychaudhuri
• Sara Lindstrom

Partitioning heritability by functional
annotation is an important goal
• Learn about genetic architecture of disease
– Where does the heritability lie?
• Learn about disease biology
– What are the relevant cell types?
• Learn about the functional annotations
– Which functional annotations show the highest
enrichments?
• Downstream applications
– Fine mapping
– Risk prediction
– GWAS priors
Maurano et al. 2012 Science
Trynka et al. 2013 Nat Genet
Pickrell 2014 AJHG

What is partitioned heritability?
• Our model is
Where
• Y is an individual’s phenotype,
• Xj is an individual’s genotype at the j-th SNP
(normalized to mean 0 and variance 1),
• βj is the effect of SNP j, and
• ε is noise and random environmental effects.

• Our model is
• We define heritability as

• Our model is
• We define heritability as
and the heritability of a category as

Partitioning heritability
using variance components has yielded
many insights
• 31% of schizophrenia SNP-heritability lies in CNS+
gene regions spanning 20% of the genome1.
• 28% of Tourette syndrome SNP-heritability and
29% of OCD SNP-heritability lies in parietal lobe
eQTLs spanning 5% of the genome2.
• 79% of SNP-heritability, averaged across WTCCC
and WTCCC2 traits, lies in DHS regions spanning
16% of the genome3.
1 Lee et al. 2012 Nat Genet
2 Davis et al. 2013 PLoS Genet
3 Gusev et al. in press AJHG

A method for partitioning heritability
from summary statistics is needed
• Variance components methods are intractable
at very large sample sizes.
• There is lots of information in large meta-analyses.
• Lots of publicly available summary statistics
allow us to compare many phenotypes and
many annotations to get a big picture.

Our method partitions heritability
from summary statistics
• Input:
– Sample size and p-value for every SNP tested in a
large GWAS of a quantitative or case-control trait
– LD information from a reference panel like 1000G
– Genome annotation of interest
– Other genome annotations to include in the
model.

Our method partitions heritability
from summary statistics
• Input:
– Sample size and p-value for every SNP tested in a
large GWAS of a quantitative or case-control trait
– LD information from a reference panel like 1000G
– Genome annotation of interest
– Other genome annotations to include in the
model.
• Output:
– Estimated proportion of heritability that falls
within the annotation of interest.
– Enrichment = (% of heritability) / (% of SNPs)

Outline
• Description of method
• Validation on simulated data
• Results on real data

LD is important for summary statistics-based
methods
• Some SNPs have a lot of LD
to other SNPs in the same
category.
to SNPs in other categories.
• Some SNPs do not have a lot
of LD to other SNPs.

LD is important for summary statistics-based
methods
to other SNPs in the same
category.
to SNPs in other categories.
• Some SNPs do not have a lot
of LD to other SNPs.
Our solution: LD Score Regression.
See Bulik-Sullivan et al. biorxiv (under revision, Nat
Genet) and ASHG 2014 poster 1787T Bulik-Sullivan

LD Score Regression: basic intuition
High LD region Low LD region
Chi-square
• Polygenicity causes more chi-square statistic inflation
in high LD regions than in low LD regions
Mean chi-square for high LD region: high Mean chi-square for low LD region: low

Multivariate LD Score Regression: basic
intuition
High chi-square Low chi-square
Enriched category  BIG difference between lots of LD vs little LD to the category
Low chi-square Low chi-square
Depleted category  SMALL difference between lots of LD vs little LD to the category

Multivariate LD Score regression
allows us to partition SNP heritability
• Multivariate LD Score: the sum over all SNPs
in a category of r^2 with that SNP.

• Derivations based on a polygenic model give:

• Derivations based on a polygenic model give:
• Easily extends to overlapping categories.

To estimate partitioned heritability:
• Estimate LD Scores from a reference panel.
• Regress chi-square statistics on LD Scores.
• The slopes give the partitioned heritability.
• For best results, use many categories!

Multivariate LD Score regression works
in simulations
Null simulations DHS 3x enriched
True h2(DHS) 0.092
REML (2 cat) 0.089 (0.006)
LD Score (27 cat) 0.086 (0.012)
True h2(DHS) 0.276
REML (2 cat) 0.281 (0.006)
LD Score (27 cat) 0.278 (0.013)
• Standard errors are over 100 simulations.
• Simulated quantitative phenotype with h2 = 0.5.
• M = 110,444, N = 2,713

in simulations
True h2(DHS) 0.092
REML (2 cat) 0.089 (0.006)
LD Score (27 cat) 0.086 (0.012)
True h2(DHS) 0.276
REML (2 cat) 0.281 (0.006)
LD Score (27 cat) 0.278 (0.013)
FANTOM5 Enhancer* causal
True h2(DHS) 0.379
REML (2 cat) 0.531 (0.007)
LD Score (27 cat) 0.361 (0.015)
• M = 110,444, N = 2,713
* Andersson et al. 2014 Nature

in simulations
True h2(DHS) 0.092
REML (2 cat) 0.089 (0.006)
LD Score (27 cat) 0.086 (0.012)
True h2(DHS) 0.276
REML (2 cat) 0.281 (0.006)
LD Score (27 cat) 0.278 (0.013)
FANTOM5 Enhancer* causal
True h2(DHS) 0.379
REML (2 cat) 0.531 (0.007)
LD Score (27 cat) 0.361 (0.015)
FANTOM5 Enhancer* causal,
Excluded from the model
True h2(DHS) 0.379
REML (2 cat) 0.531 (0.007)
LD Score (26 cat) 0.318 (0.014)
• M = 110,444, N = 2,713
* Andersson et al. 2014 Nature

Datasets analyzed
Phenotype Citation Sample size
Schizophrenia SCZ working grp of the PGC, 2014 Nature 70,100
Bipolar Disorder Bip working grp of the PGC, 2011 Nat Genet 16,731
Rheumatoid Arthritis* Okada et al., 2014 Nature 38,242
Crohn’s Disease* Jostins et al., 2012 Nature 20,883
Ulcerative Colitis* Jostins et al., 2012 Nature 27,432
Height Wood et al., 2014 Nature Genetics 253,280
BMI Speliotes et al., 2010 Nature Genetics 123,865
Coronary Artery Disease Schunkert et al., 2011 Nature Genetics 86,995
College (yes/no) Rietveld et al., Science 2013 126,559
Type 2 Diabetes Morris et al., 2012 Nature Genetics 69,033
*HLA locus excluded from all analyses for autoimmune traits

Annotations used
Mark Source/reference
Coding, 3’ UTR, 5’ UTR, Promoter, Intron UCSC; Gusev et al., in press AJHG
Digital Genomic Footprint, TFBS ENCODE; Gusev et al., in press AJHG
CTCF binding site, Promoter Flanking,
Repressed, Transcribed, TSS, Enhancer,
Weak Enhancer
ENCODE; Hoffman et al., 2012 Nucleic
Acids Research
DHS, fetal DHS, H3K4me1, H3K4me3,
H3K9ac
Trynka et al., 2013 Nature Genetics.*
Conserved Lindblad-Toh et al., 2011 Nature
FANTOM5 Enhancer Andersson et al., 2014 Nature
lincRNAs Cabili et al., 2011 Genes Dev
DHS and DHS promoter Maurano et al., 2012 Science
H3K27ac Roadmap; PGC2 2014 Nature
*Post-processed from ENCODE and Roadmap data by S. Raychaudhuri and X. Liu labs

Coding, Intergenic, Enhancer, H3K4me3, and DHS
enrichments in six phenotypes
(Bars indicate 95% confidence intervals)

Coding, Intergenic, Enhancer, H3K4me3, DHS, and
Conserved enrichments in six phenotypes
*Lindblad-Toh et al., 2011 Nature

Coding, Intergenic, Enhancer, H3K4me3, DHS, and
FANTOM5 Enhancer enrichments in six phenotypes
*Andersson et al., 2014 Nature

Cell-type specific H3K27ac enrichments
inform trait biology
• We group 56 cell types into 7 basic categories.
• For each trait (10 traits)
– For each category (7 categories)
• We asses the significance of improvement to
the model from adding that category.

Conclusions
• Many annotations are enriched in many
phenotypes.
• Conserved regions, 2.6% of SNPs, are
estimated to explain 30% of heritability on
average.
• FANTOM5 Enhancers are extremely enriched
in auto-immune traits.
• H3K27ac cell-type enrichment matches and
extends our understanding of disease biology.

Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Similar to Partitioning Heritability using GWAS Summary Statistics with LD Score Regression (20)

Recently uploaded

Recently uploaded (20)

Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Editor's Notes