1. Rampant Purifying Selection Drives Singleton
Variants To Be Major Source Of Heritability For
Human Gene Expression
Ryan D. Hernandez
ryan.hernandez@me.com
@rdhernand
Postdoc
positions
available!!!
2. Singleton Variants Dominate The Genetic
Architecture Of Human Gene Expression
Hernandez, Uricchio, Hartman, Ye, Dahl, Zaitlen
Selection And Explosive Growth Alter
Genetic Architecture And Hamper The
Detection Of Causal Rare Variants
Uricchio, Zaitlen, Ye, Witte, Hernandez. Genome Research (2016).
Ryan D. Hernandez
ryan.hernandez@me.com
@rdhernand
3. Heritability and Human Height
http://i.ytimg.com/vi/E0Aeks_id6c/maxresdefault.jpg
heritability: how
much of the variation
in height among
people can be
explained by genetic
factors?
This is not the same
as asking how much
genetic factors
influence height in
any one person.
https://en.wikipedia.org/wiki/Heritability
4. GWAS have the potential to explain
60% of the variation in height
€
hg
2
: The narrow-sense heritability
explained by all genotyped SNPs.
250,000 subjects
Wood et al, 2014 Nat. Genet.
i.ytimg.com/vi/E0Aeks_id6c/maxresdefault.jpg
5. Challenges For Studying Complex Diseases
Thecaseofthemissingheritability
NATURE|Vol 456|6 November 2008NEWS FEATURE PERSONAL GENOMES
Maher, Nature (2008).
7. The Effect of Negative Selection
Deleterious
mutations will
arise in the next
generation
Chromosomes in
a population with
standing variation
Negative selection:
the action of
natural selection
purging deleterious
mutations.
8. Majority of human genetic
variation is rare
Class Fraction of variants < 1%
Missense 92.6%
Synonymous 88.5%
Non-coding 82.3%
Variants with frequency <1%
1 2 5 10 20 50 100 500 2000 5000
0.00.10.20.30.40.50.6
Number of non−reference alleles (log scale)
Fractionofvariants
missense
synonymous
noncoding
neutral model
9. 0
0.2
0.4
0.6
0.8
1.0
1
10,000
2
10,000
0.03–0.1%
0.1–0.5%
0.5–1%
1–2%
2–5%
5–10%
10–100%
Synonymous
Neutral: –10–5 < s < 0
Nearly Neut.: –10–4 < s < –10–5
Weak: –10–3 < s < –10–4
Moderate: –10–2 < s < –10–3
Strong: s < –10–2
Colorversionavailableonlin
Derived allele frequency
0
0.2
0.4
0.6
0.8
1.0
1
10,000
2
10,000
0.03–0.1%
0.1–0.5%
0.5–1%
1–2%
2–5%
5–10%
10–100%
Synonymous
Neutral: –10–5 < s < 0
Nearly Neut.: –10–4 < s < –10–5
Weak: –10–3 < s < –10–4
Moderate: –10–2 < s < –10–3
Strong: s < –10–2
Colorversionavailableonlin
0
0.2
0.4
0.6
0.8
1.0
1
10,000
2
10,000
0.03–0.1%
0.1–0.5%
0.5–1%
1–2%
2–5%
5–10%
10–100%
Synonymous
Neutral: –10–5 < s < 0
Nearly Neut.: –10–4 < s < –10–5
Weak: –10–3 < s < –10–4
Moderate: –10–2 < s < –10–3
Strong: s < –10–2
Colorversionavailableonlin
Maher, et al. Hum Hered 74, 118-128 (2012).
10. Evolutionary Models Of
Complex Disease
Other
Phenotype
SNP
Disease
propensity
Disease
Fitness
Pleiotropy: SNP impacts multiple phenotypes
ρ:
correlation(effect size, fitness)
(Simons et al, 2014)
τ:
transforms fitness effect to
phenotype (Eyre-Walker, 2010)
Uricchio et al., Genome Research (2016)
11. Growth model: Gravel et al (2011)
Explosive growth: Tennessen et al (2012)
AFRICA EUROPE
0
0.2
0.4
0.6
0.8
1.0
1
10,000
2
10,000
0.03–0.1%
0.1–0.5%
0.5–1%
1–2%
2–5%
5–10%
10–100%
Synonymous
Neutral: –10–5 < s < 0
Nearly Neut.: –10–4 < s < –10–5
Weak: –10–3 < s < –10–4
Moderate: –10–2 < s < –10–3
Strong: s < –10–2
Fig. 3. The distribution of selection coefficients for variants in each
allele frequency bin (starting with singletons, doubletons, and then
a disjoint partition of rare and common variants). Common vari-
ants (>5%) are expected to be primarily neutral or nearly neutral.
Most strongly deleterious mutations have frequency <0.1%.
0.011–0.999
1
Fitness effects in non-coding DNA:
Torgerson et al (2009)
effect size = f(demography, natural selection)
Human-specific demography and Selection
Uricchio, et al. Genome Res 26, 863-873 (2016).
12. Neutral model: most variance
explained by common alleles
Proportionofh2explained
byalleleswithfreq≤ω
5e-04 1e-03 5e-03 1e-02 5e-02 1e-01 5e-01 1e+00
0.00.20.40.60.81.0
Standard Neutral Model
derived allele frequency, ω
Proportionofvarianceexplained,Vω/V1
Ultrarare
Intermediaterare
Common
5e-04 1e-03 5e-03 1e-02 5e-02 1e-01 5e-01 1e+00
0.00.20.40.60.81.0
Standard Neutral Model
derived allele frequency, ω
Proportionofvarianceexplained,Vω/V1
Ultrarare
Intermediaterare
Common
5e-04 1e-03 5e-03 1e-02 5e-02 1e-01 5e-01 1e+00
0.00.20.40.60.81.0
Standard Neutral Model
derived allele frequency, ω
Proportionofvarianceexplained,Vω/V1
Ultrarare
Intermediaterare
Common
Uricchio, et al. Genome Res 26, 863-873 (2016).
13. 5e-04 5e-03 5e-02 5e-01
0.00.20.40.60.81.0
AFR, Growth
derived allele frequency, ω
VωV1
Proportionofh2explained
byalleleswithfreq≤ω
Evolutionary Model: Most Variance explained
by Ultra Rare Alleles or common alleles…
Uricchio, et al. Genome Res 26, 863-873 (2016).
14. Evolutionary Model: Most Variance explained
by Ultra Rare Alleles or common alleles…
Uricchio et al.
Coldon June 17, 2016 - Published bygenome.cshlp.orgDownloaded from
Uricchio et al.
Coldon June 17, 2016 - Published bygenome.cshlp.orgDownloaded from
Proportionofvarianceexplained
byalleleswithfreq≤x
Proportionofvarianceexplained
byalleleswithfreq≤x
15. model captures the
ill have a tight corre-
In Figure 1B, we plot the proporti
variance that is explained by singletons, V
of the genetic archi
the variance explai
frequency 1, and h
genetic variance in
Vc/V1 is strongly im
events and the rela
tion and effect siz
the role of rare vari
contractions have
tained exponentia
drastic increase in
Note that this tim
for exponential ex
different from the
pansion event at t
ts under a European growth model (Gravel et al.
singletons (c). (B) The proportion of the genetic
model captures the
ill have a tight corre-
In Figure 1B, we plot the proporti
variance that is explained by singletons, V
of the genetic archi
the variance explai
frequency 1, and h
genetic variance in
Vc/V1 is strongly im
events and the rela
tion and effect siz
the role of rare vari
contractions have
tained exponentia
drastic increase in
Note that this tim
for exponential ex
different from the
pansion event at t
ts under a European growth model (Gravel et al.
singletons (c). (B) The proportion of the genetic
Demography and selection shape the
contribution of ultra-rare variants
model captures the
ill have a tight corre-
In Figure 1B, we plot the proporti
variance that is explained by singletons, V
of the genetic archi
the variance explai
frequency 1, and h
genetic variance in
Vc/V1 is strongly im
events and the rela
tion and effect siz
the role of rare vari
contractions have
tained exponentia
drastic increase in
Note that this tim
for exponential ex
different from the
pansion event at t
ts under a European growth model (Gravel et al.
singletons (c). (B) The proportion of the genetic
Proportionofvariance
explainedbysingletons
Uricchio, et al. Genome Res 26, 863-873 (2016).
•European demography
(Gravel et al., 2011)
•n=500 chromosomes
16. • Large-scale RNA sequencing + WGS
• 4 European populations
• 360 individuals
• low coverage WGS + high coverage
exome: Phase 3.
• RNA-seq: median depth 58.3M reads
• Gene expression:
log2 transformed, median centered,
and quantile normalized.
• 10,077 unique genes.
FIN
GBR
TSI
CEU
YRI
Noah
Zaitlen
17. • Our sample size is
small, but can we
learn anything about
the genetic basis of
complex traits from
these 10k genes?
• Let’s analyze
heritability of gene
expression due to cis
variation (within 1Mb
of gene)
• Large-scale RNA sequencing + WGS
• 4 European populations
• 360 individuals
• low coverage WGS + high coverage
exome: Phase 3.
• RNA-seq: median depth 58.3M reads
• Gene expression:
log2 transformed, median centered,
and quantile normalized.
• 10,077 unique genes.
18. Heritability
• Several pioneers have developed ways to infer heritability
from unrelated individuals
• Linear Mixed Models (GCTA/LDAK)
• Requires complex optimization procedures that often fail
with very small sample sizes.
• Can be biased.
• Haseman-Elston (HE) regression
• Much faster than LMM.
• Unbiased.
• No sample size issues!
19. Major Problem
• We can apply a method to data and get an answer, but we have
no idea whether it would actually be a good one!
• There is no complex trait in which we know:
• The number of causal variants
• The frequencies of the causal variants
• The effect sizes of the causal variants
• The fitness effect of the causal variants
• We need a thorough simulation study where we can vary all of
these parameters and see how they effect our answer!
21. 1
0.00.51.0
SNP bin
Biasofh2
bin
K=1
1 2
0.00.51.0
SNP bin
K=2
1 3 5
0.00.51.0
SNP bin
K=5
1 5 10
0.00.51.0 SNP bin
K=10
1 10 20
0.00.51.0
SNP bin
K=20
Our Approach:
Partitioning SNPs by frequency
0
0.2
0.4
0.6
0.8
1.0
1
10,000
2
10,000
0.03–0.1%
0.1–0.5%
0.5–1%
1–2%
2–5%
5–10%
10–100%
Synonymous
Neutral: –10–5 < s < 0
Nearly Neut.: –10–4 < s < –10–5
Weak: –10–3 < s < –10–4
Moderate: –10–2 < s < –10–3
Strong: s < –10–2
Derived allele frequency
22. Singletons
emerge as
their own
category!
1
0.00.51.0
SNP bin
Biasofh2
bin
K=1
1 2
0.00.51.0
SNP bin
K=2
1 3 5
0.00.51.0
SNP bin
K=5
1 5 10
0.00.51.0 SNP bin
K=10
1 10 20
0.00.51.0
SNP bin
K=20
1
0.00.51.0
SNP bin
Biasofh2
bin
K=1
1 2
0.00.51.0
SNP bin
K=2
1 3 5
0.00.51.0
SNP bin
K=5
1 5 10
0.00.51.0 SNP bin
K=10
1 10 20
0.00.51.0
SNP bin
K=20
Our Approach:
Partitioning SNPs by frequency
Hernandez, Uricchio, Hartman, Ye, Dahl, Zaitlen. bioRxiv (2017)
23. 1
0.00.51.0
SNP bin
Biasofh2
bin
K=1
1 2
0.00.51.0
SNP bin
K=2
1 3 5
0.00.51.0
SNP bin
K=5
1 5 10
0.00.51.0 SNP bin
K=10
1 10 20
0.00.51.0
SNP bin
K=20
1
0.00.51.0
SNP bin
Biasofh2
bin
K=1
1 2
0.00.51.0
SNP bin
K=2
1 3 5
0.00.51.0
SNP bin
K=5
1 5 10
0.00.51.0 SNP bin
K=10
1 10 20
0.00.51.0
SNP bin
K=20
Our Approach:
Partitioning SNPs by frequency
Hernandez, Uricchio, Hartman, Ye, Dahl, Zaitlen. bioRxiv (2017)
27. Summary
•We still know little about the genetic basis of most complex
traits
•Our evolutionary modeling gives insights into what the
patterns of heritability should look like
•Analyzing heritability patterns across ~10k gene expression
patterns demonstrates a major impact of rare variants across
most genes.
•But our power to detect causal loci using existing tests is
very poor.