SlideShare a Scribd company logo
1 of 34
Partitioning heritability by functional 
annotation using summary statistics 
Hilary Finucane 
MIT Department of Mathematics 
HSPH Department of Epidemiology 
October 21, 2014
Acknowledgements 
• Brendan Bulik- 
Sullivan 
• Alkes Price 
• Ben Neale 
• Alexander Gusev 
• Nick Patterson 
• Po-Ru Loh 
• Gosia Trynka 
• Han Xu 
• Verneri Anttila 
• Yakir Reshef 
• Chongzhi Zang 
• Stephan Ripke 
• Schizophrenia Working 
Group of the PGC 
• Shaun Purcell 
• Mark Daly 
• Eli Stahl 
• Soumya Raychaudhuri 
• Sara Lindstrom
Partitioning heritability by functional 
annotation is an important goal 
• Learn about genetic architecture of disease 
– Where does the heritability lie? 
• Learn about disease biology 
– What are the relevant cell types? 
• Learn about the functional annotations 
– Which functional annotations show the highest 
enrichments? 
• Downstream applications 
– Fine mapping 
– Risk prediction 
– GWAS priors 
Maurano et al. 2012 Science 
Trynka et al. 2013 Nat Genet 
Pickrell 2014 AJHG
What is partitioned heritability? 
• Our model is 
Where 
• Y is an individual’s phenotype, 
• Xj is an individual’s genotype at the j-th SNP 
(normalized to mean 0 and variance 1), 
• βj is the effect of SNP j, and 
• ε is noise and random environmental effects.
What is partitioned heritability? 
• Our model is 
• We define heritability as
What is partitioned heritability? 
• Our model is 
• We define heritability as 
and the heritability of a category as
Partitioning heritability 
using variance components has yielded 
many insights 
• 31% of schizophrenia SNP-heritability lies in CNS+ 
gene regions spanning 20% of the genome1. 
• 28% of Tourette syndrome SNP-heritability and 
29% of OCD SNP-heritability lies in parietal lobe 
eQTLs spanning 5% of the genome2. 
• 79% of SNP-heritability, averaged across WTCCC 
and WTCCC2 traits, lies in DHS regions spanning 
16% of the genome3. 
1 Lee et al. 2012 Nat Genet 
2 Davis et al. 2013 PLoS Genet 
3 Gusev et al. in press AJHG
A method for partitioning heritability 
from summary statistics is needed 
• Variance components methods are intractable 
at very large sample sizes. 
• There is lots of information in large meta-analyses. 
• Lots of publicly available summary statistics 
allow us to compare many phenotypes and 
many annotations to get a big picture.
Our method partitions heritability 
from summary statistics 
• Input: 
– Sample size and p-value for every SNP tested in a 
large GWAS of a quantitative or case-control trait 
– LD information from a reference panel like 1000G 
– Genome annotation of interest 
– Other genome annotations to include in the 
model.
Our method partitions heritability 
from summary statistics 
• Input: 
– Sample size and p-value for every SNP tested in a 
large GWAS of a quantitative or case-control trait 
– LD information from a reference panel like 1000G 
– Genome annotation of interest 
– Other genome annotations to include in the 
model. 
• Output: 
– Estimated proportion of heritability that falls 
within the annotation of interest. 
– Enrichment = (% of heritability) / (% of SNPs)
Outline 
• Description of method 
• Validation on simulated data 
• Results on real data
Outline 
• Description of method 
• Validation on simulated data 
• Results on real data
LD is important for summary statistics-based 
methods 
• Some SNPs have a lot of LD 
to other SNPs in the same 
category. 
• Some SNPs have a lot of LD 
to SNPs in other categories. 
• Some SNPs do not have a lot 
of LD to other SNPs.
LD is important for summary statistics-based 
methods 
• Some SNPs have a lot of LD 
to other SNPs in the same 
category. 
• Some SNPs have a lot of LD 
to SNPs in other categories. 
• Some SNPs do not have a lot 
of LD to other SNPs. 
Our solution: LD Score Regression. 
See Bulik-Sullivan et al. biorxiv (under revision, Nat 
Genet) and ASHG 2014 poster 1787T Bulik-Sullivan
LD Score Regression: basic intuition 
High LD region Low LD region 
Chi-square 
• Polygenicity causes more chi-square statistic inflation 
in high LD regions than in low LD regions 
Mean chi-square for high LD region: high Mean chi-square for low LD region: low
Multivariate LD Score Regression: basic 
intuition 
High chi-square Low chi-square 
Enriched category  BIG difference between lots of LD vs little LD to the category 
Low chi-square Low chi-square 
Depleted category  SMALL difference between lots of LD vs little LD to the category
Multivariate LD Score regression 
allows us to partition SNP heritability 
• Multivariate LD Score: the sum over all SNPs 
in a category of r^2 with that SNP.
Multivariate LD Score regression 
allows us to partition SNP heritability 
• Multivariate LD Score: the sum over all SNPs 
in a category of r^2 with that SNP. 
• Derivations based on a polygenic model give:
Multivariate LD Score regression 
allows us to partition SNP heritability 
• Multivariate LD Score: the sum over all SNPs 
in a category of r^2 with that SNP. 
• Derivations based on a polygenic model give: 
• Easily extends to overlapping categories.
Multivariate LD Score regression 
allows us to partition SNP heritability 
To estimate partitioned heritability: 
• Estimate LD Scores from a reference panel. 
• Regress chi-square statistics on LD Scores. 
• The slopes give the partitioned heritability. 
• For best results, use many categories!
Outline 
• Description of method 
• Validation on simulated data 
• Results on real data
Multivariate LD Score regression works 
in simulations 
Null simulations DHS 3x enriched 
True h2(DHS) 0.092 
REML (2 cat) 0.089 (0.006) 
LD Score (27 cat) 0.086 (0.012) 
True h2(DHS) 0.276 
REML (2 cat) 0.281 (0.006) 
LD Score (27 cat) 0.278 (0.013) 
• Standard errors are over 100 simulations. 
• Simulated quantitative phenotype with h2 = 0.5. 
• M = 110,444, N = 2,713
Multivariate LD Score regression works 
in simulations 
Null simulations DHS 3x enriched 
True h2(DHS) 0.092 
REML (2 cat) 0.089 (0.006) 
LD Score (27 cat) 0.086 (0.012) 
True h2(DHS) 0.276 
REML (2 cat) 0.281 (0.006) 
LD Score (27 cat) 0.278 (0.013) 
FANTOM5 Enhancer* causal 
True h2(DHS) 0.379 
REML (2 cat) 0.531 (0.007) 
LD Score (27 cat) 0.361 (0.015) 
• Standard errors are over 100 simulations. 
• Simulated quantitative phenotype with h2 = 0.5. 
• M = 110,444, N = 2,713 
* Andersson et al. 2014 Nature
Multivariate LD Score regression works 
in simulations 
Null simulations DHS 3x enriched 
True h2(DHS) 0.092 
REML (2 cat) 0.089 (0.006) 
LD Score (27 cat) 0.086 (0.012) 
True h2(DHS) 0.276 
REML (2 cat) 0.281 (0.006) 
LD Score (27 cat) 0.278 (0.013) 
FANTOM5 Enhancer* causal 
True h2(DHS) 0.379 
REML (2 cat) 0.531 (0.007) 
LD Score (27 cat) 0.361 (0.015) 
FANTOM5 Enhancer* causal, 
Excluded from the model 
True h2(DHS) 0.379 
REML (2 cat) 0.531 (0.007) 
LD Score (26 cat) 0.318 (0.014) 
• Standard errors are over 100 simulations. 
• Simulated quantitative phenotype with h2 = 0.5. 
• M = 110,444, N = 2,713 
* Andersson et al. 2014 Nature
Outline 
• Description of method 
• Validation on simulated data 
• Results on real data
Datasets analyzed 
Phenotype Citation Sample size 
Schizophrenia SCZ working grp of the PGC, 2014 Nature 70,100 
Bipolar Disorder Bip working grp of the PGC, 2011 Nat Genet 16,731 
Rheumatoid Arthritis* Okada et al., 2014 Nature 38,242 
Crohn’s Disease* Jostins et al., 2012 Nature 20,883 
Ulcerative Colitis* Jostins et al., 2012 Nature 27,432 
Height Wood et al., 2014 Nature Genetics 253,280 
BMI Speliotes et al., 2010 Nature Genetics 123,865 
Coronary Artery Disease Schunkert et al., 2011 Nature Genetics 86,995 
College (yes/no) Rietveld et al., Science 2013 126,559 
Type 2 Diabetes Morris et al., 2012 Nature Genetics 69,033 
*HLA locus excluded from all analyses for autoimmune traits
Annotations used 
Mark Source/reference 
Coding, 3’ UTR, 5’ UTR, Promoter, Intron UCSC; Gusev et al., in press AJHG 
Digital Genomic Footprint, TFBS ENCODE; Gusev et al., in press AJHG 
CTCF binding site, Promoter Flanking, 
Repressed, Transcribed, TSS, Enhancer, 
Weak Enhancer 
ENCODE; Hoffman et al., 2012 Nucleic 
Acids Research 
DHS, fetal DHS, H3K4me1, H3K4me3, 
H3K9ac 
Trynka et al., 2013 Nature Genetics.* 
Conserved Lindblad-Toh et al., 2011 Nature 
FANTOM5 Enhancer Andersson et al., 2014 Nature 
lincRNAs Cabili et al., 2011 Genes Dev 
DHS and DHS promoter Maurano et al., 2012 Science 
H3K27ac Roadmap; PGC2 2014 Nature 
*Post-processed from ENCODE and Roadmap data by S. Raychaudhuri and X. Liu labs
Coding, Intergenic, Enhancer, H3K4me3, and DHS 
enrichments in six phenotypes 
(Bars indicate 95% confidence intervals)
Coding, Intergenic, Enhancer, H3K4me3, DHS, and 
Conserved enrichments in six phenotypes 
(Bars indicate 95% confidence intervals) 
*Lindblad-Toh et al., 2011 Nature
Coding, Intergenic, Enhancer, H3K4me3, DHS, and 
FANTOM5 Enhancer enrichments in six phenotypes 
(Bars indicate 95% confidence intervals) 
*Andersson et al., 2014 Nature
Cell-type specific H3K27ac enrichments 
inform trait biology 
• We group 56 cell types into 7 basic categories. 
• For each trait (10 traits) 
– For each category (7 categories) 
• We asses the significance of improvement to 
the model from adding that category.
Conclusions 
• Many annotations are enriched in many 
phenotypes. 
• Conserved regions, 2.6% of SNPs, are 
estimated to explain 30% of heritability on 
average. 
• FANTOM5 Enhancers are extremely enriched 
in auto-immune traits. 
• H3K27ac cell-type enrichment matches and 
extends our understanding of disease biology.
Acknowledgements 
• Brendan Bulik- 
Sullivan 
• Alkes Price 
• Ben Neale 
• Alexander Gusev 
• Nick Patterson 
• Po-Ru Loh 
• Gosia Trynka 
• Han Xu 
• Verneri Anttila 
• Yakir Reshef 
• Chongzhi Zang 
• Stephan Ripke 
• Schizophrenia Working 
Group of the PGC 
• Shaun Purcell 
• Mark Daly 
• Eli Stahl 
• Soumya Raychaudhuri 
• Sara Lindstrom

More Related Content

What's hot

Association mapping
Association mappingAssociation mapping
Association mappingNivethitha T
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
Genomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingGenomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingCIMMYT
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics OmicsHiplot
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Genomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single stepGenomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single stepILRI
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
Maxam gilbert method for DNA Sequencing
Maxam gilbert method for DNA Sequencing Maxam gilbert method for DNA Sequencing
Maxam gilbert method for DNA Sequencing Abhay jha
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov modelUshaYadav24
 
Biometrical Techniques in Plant Breeding
Biometrical Techniques in Plant Breeding Biometrical Techniques in Plant Breeding
Biometrical Techniques in Plant Breeding Akshay Deshmukh
 
Estimation of genetic variability and efficiency of selection for grain yield...
Estimation of genetic variability and efficiency of selection for grain yield...Estimation of genetic variability and efficiency of selection for grain yield...
Estimation of genetic variability and efficiency of selection for grain yield...Naveen Jakhar
 
NanoPore Tequencing Technology
NanoPore Tequencing TechnologyNanoPore Tequencing Technology
NanoPore Tequencing TechnologyAhmed Madni
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talkjoshbis
 
Tilling and eco tilling
Tilling and eco tillingTilling and eco tilling
Tilling and eco tillingSuresh Antre
 

What's hot (20)

Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
marker assisted selection
marker assisted selectionmarker assisted selection
marker assisted selection
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Microsatelit
MicrosatelitMicrosatelit
Microsatelit
 
Genomic Selection & Precision Phenotyping
Genomic Selection & Precision PhenotypingGenomic Selection & Precision Phenotyping
Genomic Selection & Precision Phenotyping
 
Bioinformatics Omics
Bioinformatics OmicsBioinformatics Omics
Bioinformatics Omics
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Genomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single stepGenomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single step
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Maxam gilbert method for DNA Sequencing
Maxam gilbert method for DNA Sequencing Maxam gilbert method for DNA Sequencing
Maxam gilbert method for DNA Sequencing
 
Hidden markov model
Hidden markov modelHidden markov model
Hidden markov model
 
Biometrical Techniques in Plant Breeding
Biometrical Techniques in Plant Breeding Biometrical Techniques in Plant Breeding
Biometrical Techniques in Plant Breeding
 
Estimation of genetic variability and efficiency of selection for grain yield...
Estimation of genetic variability and efficiency of selection for grain yield...Estimation of genetic variability and efficiency of selection for grain yield...
Estimation of genetic variability and efficiency of selection for grain yield...
 
NanoPore Tequencing Technology
NanoPore Tequencing TechnologyNanoPore Tequencing Technology
NanoPore Tequencing Technology
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Epi519 Gwas Talk
Epi519 Gwas TalkEpi519 Gwas Talk
Epi519 Gwas Talk
 
Tilling and eco tilling
Tilling and eco tillingTilling and eco tilling
Tilling and eco tilling
 

Viewers also liked (7)

Genetic Correlation from GWAS Summary Statistics
Genetic Correlation from GWAS Summary StatisticsGenetic Correlation from GWAS Summary Statistics
Genetic Correlation from GWAS Summary Statistics
 
Lecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability fullLecture 3 quantitative traits and heritability full
Lecture 3 quantitative traits and heritability full
 
Heritability of intelligence 3pdf
Heritability of intelligence 3pdfHeritability of intelligence 3pdf
Heritability of intelligence 3pdf
 
Heritability and Genetic Advance for Grain Yield and its Component Characters...
Heritability and Genetic Advance for Grain Yield and its Component Characters...Heritability and Genetic Advance for Grain Yield and its Component Characters...
Heritability and Genetic Advance for Grain Yield and its Component Characters...
 
Heritability , genetic advance
Heritability , genetic advanceHeritability , genetic advance
Heritability , genetic advance
 
Presentation on Heritability
 Presentation on Heritability Presentation on Heritability
Presentation on Heritability
 
Selection
SelectionSelection
Selection
 

Similar to Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Varsha Gayatonde
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Using and combining the different tools for predicting the pathogenicity of s...
Using and combining the different tools for predicting the pathogenicity of s...Using and combining the different tools for predicting the pathogenicity of s...
Using and combining the different tools for predicting the pathogenicity of s...Vall d'Hebron Institute of Research (VHIR)
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methodshad89
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
 
Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Pascal Mayer
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaDelaina Hawkins
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaGolden Helix
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Fatma Sayed Ibrahim
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.pptmelkamugenet
 
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...OSUCCC - James
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfsabyabby
 
IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationPhilippe Henry
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceGolden Helix
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceGolden Helix Inc
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 

Similar to Partitioning Heritability using GWAS Summary Statistics with LD Score Regression (20)

Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Using and combining the different tools for predicting the pathogenicity of s...
Using and combining the different tools for predicting the pathogenicity of s...Using and combining the different tools for predicting the pathogenicity of s...
Using and combining the different tools for predicting the pathogenicity of s...
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09Manteia non confidential-presentation 2003-09
Manteia non confidential-presentation 2003-09
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemia
 
Using NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemiaUsing NGS to detect CNVs in familial hypercholesterolemia
Using NGS to detect CNVs in familial hypercholesterolemia
 
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
Algorithm Implementation of Genetic Association ‎Analysis for Rheumatoid Arth...
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.ppt
 
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
Genetic predisposition to papillary thyroid cancer by Albert de la Chapelle, ...
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdf
 
Molecular profiling 2013
Molecular profiling 2013Molecular profiling 2013
Molecular profiling 2013
 
IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptation
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol Dependence
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol Dependence
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 

Partitioning Heritability using GWAS Summary Statistics with LD Score Regression

  • 1. Partitioning heritability by functional annotation using summary statistics Hilary Finucane MIT Department of Mathematics HSPH Department of Epidemiology October 21, 2014
  • 2. Acknowledgements • Brendan Bulik- Sullivan • Alkes Price • Ben Neale • Alexander Gusev • Nick Patterson • Po-Ru Loh • Gosia Trynka • Han Xu • Verneri Anttila • Yakir Reshef • Chongzhi Zang • Stephan Ripke • Schizophrenia Working Group of the PGC • Shaun Purcell • Mark Daly • Eli Stahl • Soumya Raychaudhuri • Sara Lindstrom
  • 3. Partitioning heritability by functional annotation is an important goal • Learn about genetic architecture of disease – Where does the heritability lie? • Learn about disease biology – What are the relevant cell types? • Learn about the functional annotations – Which functional annotations show the highest enrichments? • Downstream applications – Fine mapping – Risk prediction – GWAS priors Maurano et al. 2012 Science Trynka et al. 2013 Nat Genet Pickrell 2014 AJHG
  • 4. What is partitioned heritability? • Our model is Where • Y is an individual’s phenotype, • Xj is an individual’s genotype at the j-th SNP (normalized to mean 0 and variance 1), • βj is the effect of SNP j, and • ε is noise and random environmental effects.
  • 5. What is partitioned heritability? • Our model is • We define heritability as
  • 6. What is partitioned heritability? • Our model is • We define heritability as and the heritability of a category as
  • 7. Partitioning heritability using variance components has yielded many insights • 31% of schizophrenia SNP-heritability lies in CNS+ gene regions spanning 20% of the genome1. • 28% of Tourette syndrome SNP-heritability and 29% of OCD SNP-heritability lies in parietal lobe eQTLs spanning 5% of the genome2. • 79% of SNP-heritability, averaged across WTCCC and WTCCC2 traits, lies in DHS regions spanning 16% of the genome3. 1 Lee et al. 2012 Nat Genet 2 Davis et al. 2013 PLoS Genet 3 Gusev et al. in press AJHG
  • 8. A method for partitioning heritability from summary statistics is needed • Variance components methods are intractable at very large sample sizes. • There is lots of information in large meta-analyses. • Lots of publicly available summary statistics allow us to compare many phenotypes and many annotations to get a big picture.
  • 9. Our method partitions heritability from summary statistics • Input: – Sample size and p-value for every SNP tested in a large GWAS of a quantitative or case-control trait – LD information from a reference panel like 1000G – Genome annotation of interest – Other genome annotations to include in the model.
  • 10. Our method partitions heritability from summary statistics • Input: – Sample size and p-value for every SNP tested in a large GWAS of a quantitative or case-control trait – LD information from a reference panel like 1000G – Genome annotation of interest – Other genome annotations to include in the model. • Output: – Estimated proportion of heritability that falls within the annotation of interest. – Enrichment = (% of heritability) / (% of SNPs)
  • 11. Outline • Description of method • Validation on simulated data • Results on real data
  • 12. Outline • Description of method • Validation on simulated data • Results on real data
  • 13. LD is important for summary statistics-based methods • Some SNPs have a lot of LD to other SNPs in the same category. • Some SNPs have a lot of LD to SNPs in other categories. • Some SNPs do not have a lot of LD to other SNPs.
  • 14. LD is important for summary statistics-based methods • Some SNPs have a lot of LD to other SNPs in the same category. • Some SNPs have a lot of LD to SNPs in other categories. • Some SNPs do not have a lot of LD to other SNPs. Our solution: LD Score Regression. See Bulik-Sullivan et al. biorxiv (under revision, Nat Genet) and ASHG 2014 poster 1787T Bulik-Sullivan
  • 15. LD Score Regression: basic intuition High LD region Low LD region Chi-square • Polygenicity causes more chi-square statistic inflation in high LD regions than in low LD regions Mean chi-square for high LD region: high Mean chi-square for low LD region: low
  • 16. Multivariate LD Score Regression: basic intuition High chi-square Low chi-square Enriched category  BIG difference between lots of LD vs little LD to the category Low chi-square Low chi-square Depleted category  SMALL difference between lots of LD vs little LD to the category
  • 17. Multivariate LD Score regression allows us to partition SNP heritability • Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP.
  • 18. Multivariate LD Score regression allows us to partition SNP heritability • Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP. • Derivations based on a polygenic model give:
  • 19. Multivariate LD Score regression allows us to partition SNP heritability • Multivariate LD Score: the sum over all SNPs in a category of r^2 with that SNP. • Derivations based on a polygenic model give: • Easily extends to overlapping categories.
  • 20. Multivariate LD Score regression allows us to partition SNP heritability To estimate partitioned heritability: • Estimate LD Scores from a reference panel. • Regress chi-square statistics on LD Scores. • The slopes give the partitioned heritability. • For best results, use many categories!
  • 21. Outline • Description of method • Validation on simulated data • Results on real data
  • 22. Multivariate LD Score regression works in simulations Null simulations DHS 3x enriched True h2(DHS) 0.092 REML (2 cat) 0.089 (0.006) LD Score (27 cat) 0.086 (0.012) True h2(DHS) 0.276 REML (2 cat) 0.281 (0.006) LD Score (27 cat) 0.278 (0.013) • Standard errors are over 100 simulations. • Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713
  • 23. Multivariate LD Score regression works in simulations Null simulations DHS 3x enriched True h2(DHS) 0.092 REML (2 cat) 0.089 (0.006) LD Score (27 cat) 0.086 (0.012) True h2(DHS) 0.276 REML (2 cat) 0.281 (0.006) LD Score (27 cat) 0.278 (0.013) FANTOM5 Enhancer* causal True h2(DHS) 0.379 REML (2 cat) 0.531 (0.007) LD Score (27 cat) 0.361 (0.015) • Standard errors are over 100 simulations. • Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713 * Andersson et al. 2014 Nature
  • 24. Multivariate LD Score regression works in simulations Null simulations DHS 3x enriched True h2(DHS) 0.092 REML (2 cat) 0.089 (0.006) LD Score (27 cat) 0.086 (0.012) True h2(DHS) 0.276 REML (2 cat) 0.281 (0.006) LD Score (27 cat) 0.278 (0.013) FANTOM5 Enhancer* causal True h2(DHS) 0.379 REML (2 cat) 0.531 (0.007) LD Score (27 cat) 0.361 (0.015) FANTOM5 Enhancer* causal, Excluded from the model True h2(DHS) 0.379 REML (2 cat) 0.531 (0.007) LD Score (26 cat) 0.318 (0.014) • Standard errors are over 100 simulations. • Simulated quantitative phenotype with h2 = 0.5. • M = 110,444, N = 2,713 * Andersson et al. 2014 Nature
  • 25. Outline • Description of method • Validation on simulated data • Results on real data
  • 26. Datasets analyzed Phenotype Citation Sample size Schizophrenia SCZ working grp of the PGC, 2014 Nature 70,100 Bipolar Disorder Bip working grp of the PGC, 2011 Nat Genet 16,731 Rheumatoid Arthritis* Okada et al., 2014 Nature 38,242 Crohn’s Disease* Jostins et al., 2012 Nature 20,883 Ulcerative Colitis* Jostins et al., 2012 Nature 27,432 Height Wood et al., 2014 Nature Genetics 253,280 BMI Speliotes et al., 2010 Nature Genetics 123,865 Coronary Artery Disease Schunkert et al., 2011 Nature Genetics 86,995 College (yes/no) Rietveld et al., Science 2013 126,559 Type 2 Diabetes Morris et al., 2012 Nature Genetics 69,033 *HLA locus excluded from all analyses for autoimmune traits
  • 27. Annotations used Mark Source/reference Coding, 3’ UTR, 5’ UTR, Promoter, Intron UCSC; Gusev et al., in press AJHG Digital Genomic Footprint, TFBS ENCODE; Gusev et al., in press AJHG CTCF binding site, Promoter Flanking, Repressed, Transcribed, TSS, Enhancer, Weak Enhancer ENCODE; Hoffman et al., 2012 Nucleic Acids Research DHS, fetal DHS, H3K4me1, H3K4me3, H3K9ac Trynka et al., 2013 Nature Genetics.* Conserved Lindblad-Toh et al., 2011 Nature FANTOM5 Enhancer Andersson et al., 2014 Nature lincRNAs Cabili et al., 2011 Genes Dev DHS and DHS promoter Maurano et al., 2012 Science H3K27ac Roadmap; PGC2 2014 Nature *Post-processed from ENCODE and Roadmap data by S. Raychaudhuri and X. Liu labs
  • 28. Coding, Intergenic, Enhancer, H3K4me3, and DHS enrichments in six phenotypes (Bars indicate 95% confidence intervals)
  • 29. Coding, Intergenic, Enhancer, H3K4me3, DHS, and Conserved enrichments in six phenotypes (Bars indicate 95% confidence intervals) *Lindblad-Toh et al., 2011 Nature
  • 30. Coding, Intergenic, Enhancer, H3K4me3, DHS, and FANTOM5 Enhancer enrichments in six phenotypes (Bars indicate 95% confidence intervals) *Andersson et al., 2014 Nature
  • 31. Cell-type specific H3K27ac enrichments inform trait biology • We group 56 cell types into 7 basic categories. • For each trait (10 traits) – For each category (7 categories) • We asses the significance of improvement to the model from adding that category.
  • 32.
  • 33. Conclusions • Many annotations are enriched in many phenotypes. • Conserved regions, 2.6% of SNPs, are estimated to explain 30% of heritability on average. • FANTOM5 Enhancers are extremely enriched in auto-immune traits. • H3K27ac cell-type enrichment matches and extends our understanding of disease biology.
  • 34. Acknowledgements • Brendan Bulik- Sullivan • Alkes Price • Ben Neale • Alexander Gusev • Nick Patterson • Po-Ru Loh • Gosia Trynka • Han Xu • Verneri Anttila • Yakir Reshef • Chongzhi Zang • Stephan Ripke • Schizophrenia Working Group of the PGC • Shaun Purcell • Mark Daly • Eli Stahl • Soumya Raychaudhuri • Sara Lindstrom

Editor's Notes

  1. For a GWAS of a common complex trait, most of the heritability—and so most of the information--lies in the majority of SNPs that do not reach statistical significance. Partitioning heritability is a way to leverage all of the SNPs, instead of just the statistically significant SNPs, to answer questions about genetic architecture, about the biology of disease, and about functional annotations.
  2. Note: this extends to case-control traits under a liability threshold model. Note: equivalent to other definitions under certain assumptions.
  3. Partitioning heritability is traditionally done with a variance components method such as REML implemented in GCTA, and has yielded many insights in the past. I’d like to highlight this recent result of Gusev et al that non-coding DHS regions comprising 16% of the genome explain an estimated 79% of heritability on average across 11 traits.
  4. We need a method for partitioning heritability from summary statistics not just because many of our largest datasets are meta-analyses for which no one has the genotype data required for a variance components approach, but also because even when we do have all of the genotypes, variance components methods are intractable, especially for more than a very few components. As an added benefit of computational ease, we can look at a lot of phenotypes and a lot of annotations to look at higher level patterns.