gnomAD at AGBT

Variation across 141,456
individuals reveals the
spectrum of loss-of-function
intolerance of the human
genome
Konrad Karczewski
March 2, 2019
@konradjk
broad.io/gnomad_lof

Range of LoF impact
embryonic lethal
recessive disease
non-essential
complex disease
beneficial
haploinsufficient disease

Identifying true LoF variants is challenging
• LoFs are rare
• LoFs are enriched for artifacts

• gnomAD: 125,748
exomes and 15,708
whole genomes
Increasing the scale of reference databases

gnomAD 2.1.1
• Data provided by 109 PIs
• 1.3 and 1.6 petabytes of BAM files
• Uniformly processed and joint called
• 12 and 24 terabyte VCFs
• Developed a novel QC pipeline
• Complete pipeline publicly available: broad.io/gnomad_qc
• All QC and analysis performed using Hail: hail.is
• Scalable to thousands of CPUs
• Enabled rapid iteration (few hours for each component,
few days for entire process)
Grace
Tiao
Laurent
Francioli
Broad Genomics Platform
Broad Data Sciences Platform
Tim PoterbaCotton Seed

gnomAD 2.1.1
• Sub-continental ancestry
• Subsets:
• controls-only
• non-neuro/non-psychiatric
• non-cancer
• non-TOPMed Bravo
Grace
Tiao
Laurent
Francioli
http://gnomad.broadinstitute.org

Staggering amounts of variation
synonymous
missense
pLoF
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
0 40,000 80,000 125,748
Sample size
Numberobserved
• gnomAD 2.1 contains:
• 230M variants in 15,708
genomes
exomes

Staggering amounts of pLoFs
• gnomAD 2.1 contains:
genomes
exomes
• Of these, we observe
515,326 predicted loss-of-
function (pLoF) variants
• Stop-gained
• Essential splice
• Frameshift indel
pLoF
0
100,000
200,000
300,000
400,000
0 40,000 80,000 125,748
Sample size
Numberobserved

LOFTEE removes benign variation
• LoF filtering plugin to VEP, LOFTEE
• Variants retained by LOFTEE are:
• more rare, and thus
• more deleterious
• After filtering, we discover 443,769
high-confidence pLoFs in gnomAD
https://github.com/konradjk/loftee
●
●
●
●
0.00
0.05
0.10
0.15
synonymous
missense
low
confidence pLoF
high confidence pLoF
MAPS

Detecting genes depleted for pLoFs
• Mutational model that predicts the number of SNVs in a given
functional class we would expect to see in each gene in a cohort
• Now incorporating methylation, improved coverage correction, LOFTEE
• Previously transformed into the probability of LoF intolerance (pLI)
• Applying to 125,748 gnomAD exomes
• Median of 17.3 pLoFs expected per gene
• Direct estimate of observed/expected ratio
• New scores: LOEUF
• Upper bound of 90% confidence interval
Kaitlin Samocha
(Samocha et al. 2014;
Lek et al. 2016)

Most genes are depleted of LoF variation
MED13L FNDC3B
Phenotype Severe Intellectual Disability None
Observed Expected Obs/Exp (CI) Observed Expected Obs/Exp (CI)
Synonymous 462 465 0.993 (0.92-1.07) 271 266 1.02 (0.92-1.13)
pLoF 0 102 0 (0-0.029) 0 68 0 (0-0.043)
• Many are extremely depleted
(<20% observed compared to
expected)
• Including most known (curated)
haploinsufficient genes
• Using upper bound of
confidence interval corrects
for small genes
0
500
1000
1500
0.0 0.5 1.0 1.5
Observed/Expected
Numberofgenes
0
200
400
600
800
0.0 0.5 1.0 1.5 2.0
LOEUF
Numberofgenes

• Binning this spectrum into deciles
Resolving the spectrum of LoF intolerance
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
More depleted
More constrained
More tolerant
Less constrained

• Known haploinsufficient genes have ~10% of the expected pLoFs
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist

• Autosomal recessive genes are centered around 60% of expected
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
Gene list from:
Blekhman et al., 2008
Berg et al., 2013

Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
• Some genes, e.g. olfactory receptors, are unconstrained

Constraint metrics reflect mouse and
cellular knockout phenotypes
Qingbo
Wang
0%
10%
20%
30%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofmouse
hetlethalknockoutgenes
Cell essential
Cell non−essential
0%
5%
10%
15%
20%
25%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofessential/
non−essentialgenes
Hart et al., 2017
Eppig et al., 2015
Motenko et al., 2015

Constraint spectrum reflects
patterns of structural variation
• Called SVs in 14,245
individuals with 30X WGS
• 9,860 rare (<1%), biallelic,
autosomal LoF deletions
• Occurrence correlates with
SNV constraint metric
Ryan
Collins
Harrison
Brand
● ●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0% 20% 40% 60% 80% 100%
LOEUF decile
Aggregatedeletion
SVobserved/expected
Preprint coming soon!

Constraint metrics are correlated with
biological relevance
Protein-protein interactions
●
●
● ●
●
●
●
● ●
●
5
10
15
20
25
0% 20% 40% 60% 80% 100%
LOEUF decile
Meannumberof
protein−proteininteractions
Gene expression
●
● ● ●
●
●
●
●
●
●
0
10
20
30
0% 20% 40% 60% 80% 100%
LOEUF decile
Numberoftissueswhere
canonicaltranscriptisexpressed
Disease-associated
●
●
●
● ●
●
●
●
●
●
0%
10%
20%
0% 20% 40% 60% 80% 100%
LOEUF decile
ProportioninOMIM

Population-based constraint
• Repeating this constraint
calculation for each population
(8,128 individuals per
population)
• Lower mean o/e and LOEUF
in African/African-Americans
• Consistent with increased
efficiency of selection in
populations with larger effective
population sizes
●
●
●
●
●
0.45
0.50
0.55
0.60
African/
African−American Latino
East Asian
European
South Asian
LOEUF

●
●
●
●
●
●
●● ●● ●● ●
●
●● ●● ●●
synonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymous
pLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoF
0
5
10
15
0% 20% 40% 60% 80% 100%
LOEUF decile
Rateratiofordenovo
variantsinID/DDcases
comparedtocontrols
Constraint improves rare disease diagnosis
• Patients with developmental
delay/intellectual disability are
15X more likely to have an de
novo LoF in a constrained gene
• 8,095 de novos in 5,305 cases
• 2,623 de novos in 2,179 controls
• Integrating expression data
improves this further
Jack
Kosmicki
Beryl
Cummings
broad.io/tx_annotation

Constraint informs common disease etiologies
• Compared to genome-wide
background, SNPs near
constrained genes are
enriched in their contribution
to heritability of common traits
• In particular, traits that
previously1 showed an
enrichment of ultra-rare
variants are also enriched
among constrained genes
●
●
●
●
●
●
● ●
●
●
1.0
1.2
1.4
0% 20% 40% 60% 80% 100%
LOEUF decile
Partitioningheritability
enrichment
Schizoprenia
Qualifications: College or University degree
Duration to first press of snap−button in each round
Educational attainment Bipolar
10-2
10
-4
10
-6
10-8
10-10
10
-12
10
-14
Activities
Cardiovascular
Cognitive
Environment
Hematological
Metabolic
Nutritional
Ophthalmological
Psychiatric
Reproduction
Respiratory
Skeletal
Social Interactions
Other
Traitenrichment
p−value
1Ganna et al. 2018 AJHG

Data publicly released with no publication restrictions
gnomad.broadinstitute.org
Matt
Solomonson
Nick
Watts
Gene model with
transcripts
Pathogenic Clinvar
Variants
Dataset
selection box
Tissue
isoform
expression
Constraint
metrics

Coming soon! Structural variant calls in the browser
gnomad.broadinstitute.org
Matt
Solomonson
Nick
Watts

• broad.io/gnomad_lof – Flagship LoF manuscript (these slides)
• broad.io/gnomad_drugs – Applications to drug targets
• broad.io/gnomad_lrrk2 – LRRK2 LoF carriers do not exhibit phenotypes
• broad.io/gnomad_uorfs – Deleterious 5’ UTR variants
• broad.io/tx_annotation – Transcript-based annotation
• Coming soon:
• broad.io/gnomad_mnv – Mechanisms of multi-nucleotide variants
• broad.io/gnomad_sv – Structural variant map of 15K genomes

Acknowledgments
• Laurent Francioli
• Grace Tiao
• Kaitlin Samocha
• Ben Neale
• Jessica Alföldi
• Kristen Laricchia
• Genomics Platform
• Data Science
Platform
• Daniel MacArthur
• Mark Daly
• Mike Talkowski
• Beryl Cummings
• Matt Solomonson
• Ben Weisburd
• Hail team
• Tim Poterba
• Cotton Seed
• Analytic and
Translational
Genetics Unit
• Genome Aggregation Database (gnomAD)
• Full list of consortia and contributors at:
gnomad.broadinstitute.org/about
broad.io/gnomad_lof

gnomAD at AGBT

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to gnomAD at AGBT

Similar to gnomAD at AGBT (20)

Recently uploaded

Recently uploaded (20)

gnomAD at AGBT

Editor's Notes