Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

gnomAD at AGBT

618 views

Published on

Slides from AGBT - Konrad Karczewski - March 2, 2019

Published in: Science
  • Be the first to comment

gnomAD at AGBT

  1. 1. Variation across 141,456 individuals reveals the spectrum of loss-of-function intolerance of the human genome Konrad Karczewski March 2, 2019 @konradjk broad.io/gnomad_lof
  2. 2. Range of LoF impact embryonic lethal recessive disease non-essential complex disease beneficial haploinsufficient disease
  3. 3. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  4. 4. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  5. 5. • gnomAD: 125,748 exomes and 15,708 whole genomes Increasing the scale of reference databases
  6. 6. gnomAD 2.1.1 • Data provided by 109 PIs • 1.3 and 1.6 petabytes of BAM files • Uniformly processed and joint called • 12 and 24 terabyte VCFs • Developed a novel QC pipeline • Complete pipeline publicly available: broad.io/gnomad_qc • All QC and analysis performed using Hail: hail.is • Scalable to thousands of CPUs • Enabled rapid iteration (few hours for each component, few days for entire process) Grace Tiao Laurent Francioli Broad Genomics Platform Broad Data Sciences Platform Tim PoterbaCotton Seed
  7. 7. gnomAD 2.1.1 • Sub-continental ancestry • Subsets: • controls-only • non-neuro/non-psychiatric • non-cancer • non-TOPMed Bravo Grace Tiao Laurent Francioli http://gnomad.broadinstitute.org
  8. 8. Staggering amounts of variation synonymous missense pLoF 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 0 40,000 80,000 125,748 Sample size Numberobserved • gnomAD 2.1 contains: • 230M variants in 15,708 genomes • 15M variants in 125,748 exomes
  9. 9. Staggering amounts of pLoFs • gnomAD 2.1 contains: • 230M variants in 15,708 genomes • 15M variants in 125,748 exomes • Of these, we observe 515,326 predicted loss-of- function (pLoF) variants • Stop-gained • Essential splice • Frameshift indel pLoF 0 100,000 200,000 300,000 400,000 0 40,000 80,000 125,748 Sample size Numberobserved
  10. 10. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  11. 11. LOFTEE removes benign variation • LoF filtering plugin to VEP, LOFTEE • Variants retained by LOFTEE are: • more rare, and thus • more deleterious • After filtering, we discover 443,769 high-confidence pLoFs in gnomAD https://github.com/konradjk/loftee ● ● ● ● 0.00 0.05 0.10 0.15 synonymous missense low confidence pLoF high confidence pLoF MAPS
  12. 12. Detecting genes depleted for pLoFs • Mutational model that predicts the number of SNVs in a given functional class we would expect to see in each gene in a cohort • Now incorporating methylation, improved coverage correction, LOFTEE • Previously transformed into the probability of LoF intolerance (pLI) • Applying to 125,748 gnomAD exomes • Median of 17.3 pLoFs expected per gene • Direct estimate of observed/expected ratio • New scores: LOEUF • Upper bound of 90% confidence interval Kaitlin Samocha (Samocha et al. 2014; Lek et al. 2016)
  13. 13. Most genes are depleted of LoF variation MED13L FNDC3B Phenotype Severe Intellectual Disability None Observed Expected Obs/Exp (CI) Observed Expected Obs/Exp (CI) Synonymous 462 465 0.993 (0.92-1.07) 271 266 1.02 (0.92-1.13) pLoF 0 102 0 (0-0.029) 0 68 0 (0-0.043) • Many are extremely depleted (<20% observed compared to expected) • Including most known (curated) haploinsufficient genes • Using upper bound of confidence interval corrects for small genes 0 500 1000 1500 0.0 0.5 1.0 1.5 Observed/Expected Numberofgenes 0 200 400 600 800 0.0 0.5 1.0 1.5 2.0 LOEUF Numberofgenes
  14. 14. • Binning this spectrum into deciles Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist More depleted More constrained More tolerant Less constrained
  15. 15. • Known haploinsufficient genes have ~10% of the expected pLoFs Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist
  16. 16. • Autosomal recessive genes are centered around 60% of expected Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist Gene list from: Blekhman et al., 2008 Berg et al., 2013
  17. 17. Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist • Some genes, e.g. olfactory receptors, are unconstrained Resolving the spectrum of LoF intolerance
  18. 18. Constraint metrics reflect mouse and cellular knockout phenotypes Qingbo Wang 0% 10% 20% 30% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofmouse hetlethalknockoutgenes Cell essential Cell non−essential 0% 5% 10% 15% 20% 25% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofessential/ non−essentialgenes Hart et al., 2017 Eppig et al., 2015 Motenko et al., 2015
  19. 19. Constraint spectrum reflects patterns of structural variation • Called SVs in 14,245 individuals with 30X WGS • 9,860 rare (<1%), biallelic, autosomal LoF deletions • Occurrence correlates with SNV constraint metric Ryan Collins Harrison Brand ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0% 20% 40% 60% 80% 100% LOEUF decile Aggregatedeletion SVobserved/expected Preprint coming soon!
  20. 20. Constraint metrics are correlated with biological relevance Protein-protein interactions ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 0% 20% 40% 60% 80% 100% LOEUF decile Meannumberof protein−proteininteractions Gene expression ● ● ● ● ● ● ● ● ● ● 0 10 20 30 0% 20% 40% 60% 80% 100% LOEUF decile Numberoftissueswhere canonicaltranscriptisexpressed Disease-associated ● ● ● ● ● ● ● ● ● ● 0% 10% 20% 0% 20% 40% 60% 80% 100% LOEUF decile ProportioninOMIM
  21. 21. Population-based constraint • Repeating this constraint calculation for each population (8,128 individuals per population) • Lower mean o/e and LOEUF in African/African-Americans • Consistent with increased efficiency of selection in populations with larger effective population sizes ● ● ● ● ● 0.45 0.50 0.55 0.60 African/ African−American Latino East Asian European South Asian LOEUF
  22. 22. ● ● ● ● ● ● ●● ●● ●● ● ● ●● ●● ●● synonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymous pLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoF 0 5 10 15 0% 20% 40% 60% 80% 100% LOEUF decile Rateratiofordenovo variantsinID/DDcases comparedtocontrols Constraint improves rare disease diagnosis • Patients with developmental delay/intellectual disability are 15X more likely to have an de novo LoF in a constrained gene • 8,095 de novos in 5,305 cases • 2,623 de novos in 2,179 controls • Integrating expression data improves this further Jack Kosmicki Beryl Cummings broad.io/tx_annotation
  23. 23. Constraint informs common disease etiologies • Compared to genome-wide background, SNPs near constrained genes are enriched in their contribution to heritability of common traits • In particular, traits that previously1 showed an enrichment of ultra-rare variants are also enriched among constrained genes ● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 0% 20% 40% 60% 80% 100% LOEUF decile Partitioningheritability enrichment Schizoprenia Qualifications: College or University degree Duration to first press of snap−button in each round Educational attainment Bipolar 10-2 10 -4 10 -6 10-8 10-10 10 -12 10 -14 Activities Cardiovascular Cognitive Environment Hematological Metabolic Nutritional Ophthalmological Psychiatric Reproduction Respiratory Skeletal Social Interactions Other Traitenrichment p−value 1Ganna et al. 2018 AJHG
  24. 24. Data publicly released with no publication restrictions gnomad.broadinstitute.org Matt Solomonson Nick Watts Gene model with transcripts Pathogenic Clinvar Variants Dataset selection box Tissue isoform expression Constraint metrics
  25. 25. Coming soon! Structural variant calls in the browser gnomad.broadinstitute.org Matt Solomonson Nick Watts
  26. 26. • broad.io/gnomad_lof – Flagship LoF manuscript (these slides) • broad.io/gnomad_drugs – Applications to drug targets • broad.io/gnomad_lrrk2 – LRRK2 LoF carriers do not exhibit phenotypes • broad.io/gnomad_uorfs – Deleterious 5’ UTR variants • broad.io/tx_annotation – Transcript-based annotation • Coming soon: • broad.io/gnomad_mnv – Mechanisms of multi-nucleotide variants • broad.io/gnomad_sv – Structural variant map of 15K genomes
  27. 27. Acknowledgments • Laurent Francioli • Grace Tiao • Kaitlin Samocha • Ben Neale • Jessica Alföldi • Kristen Laricchia • Genomics Platform • Data Science Platform • Daniel MacArthur • Mark Daly • Mike Talkowski • Beryl Cummings • Matt Solomonson • Ben Weisburd • Hail team • Tim Poterba • Cotton Seed • Analytic and Translational Genetics Unit • Genome Aggregation Database (gnomAD) • Full list of consortia and contributors at: gnomad.broadinstitute.org/about broad.io/gnomad_lof

×